While using awk some of the folder names are missing - regex

I am trying to grep folder name present after /share/volume_repository
c20_testprd_108
/share/volume_repository/c20_testprd_108_2018-01-0912:15:51.469
/share/volume_repository/test_testprd_20_2019-03-0504:03:24.24
/share/volume_repository/c20_testprd_109_2018-01-0912:11:32.915
/share/volume_repository/hp_testprd_2003_2018-10-2917:51:24.724
/share/volume_repository/hp_testprd_3335_2019-01-2220:00:17.139
/share/volume_repository/hp_testprd_2002_2018-10-2917:49:15.605
/share/shared_volume_repository/fnolan_ha_testprd_02_2018-06-2621:31:23.405
I tried to fetch by the combination of cut & awk, in awk if I am using field separator _20 it removes some of the folder names.
cat abc |cut -d '/' -f 4|awk -F '_20' '{print $1}'
Output:
c20_testprd_108
test_testprd
c20_testprd_109
hp_testprd
hp_testprd_3335
hp_testprd
fnolan_ha_testprd_02
The expected output is
c20_testprd_108
test_testprd_20
c20_testprd_109
hp_testprd_2003
hp_testprd_3335
hp_testprd_2002
fnolan_ha_testprd_02

Could you please try following. Written and tested with shown samples.
awk '
match($0,/\/share\/(shared_)?volume_repository\/[^:]*/){
value=substr($0,RSTART,RLENGTH)
gsub(/.*\/|_[0-9]+-[0-9]+-[0-9]+$/,"",value)
print value
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/\/share\/(shared_)?volume_repository\/[^:]*/){ ##Using match function to match regex from share till colon here.
value=substr($0,RSTART,RLENGTH) ##Creating var value with sub-string for current line.
gsub(/.*\/|_[0-9]+-[0-9]+-[0-9]+$/,"",value) ##Globally substituting everything till / OR last date timings from value here.
print value ##Printing value here.
}
' Input_file ##Mentioning Input_file name here.

With GNU awk:
awk '{print $4}' FS='/|_....-..-....' file
Output:
c20_testprd_108
test_testprd_20
c20_testprd_109
hp_testprd_2003
hp_testprd_3335
hp_testprd_2002
fnolan_ha_testprd_02

Related

Catch specific string using regex

I have multiple boards. Inside my bash script, I want to catch my root filesystem name using regex. When I do a cat /proc/cmdline, I have this:
BOOT_IMAGE=/vmlinuz-5.15.0-57-generic root=/dev/mapper/vgubuntu-root ro quiet splash vt.handoff=7
I just want to select /dev/mapper/vgubuntu-root
So far I have managed to catch root=/dev/mapper/vgubuntu-root using this command
\broot=[^ ]+
You can use your regex in sed with a capture group:
sed -E 's~.* root=([^ ]+).*~\1~' /proc/cmdline
/dev/mapper/vgubuntu-root
Another option is to use awk(should work in any awk):
awk 'match($0, /root=[^ ]+/) {
print substr($0, RSTART+5, RLENGTH-5)
}' /proc/cmdline
# if your string is always 2nd field then a simpler one
awk '{sub(/^[^=]+=/, "", $2); print $2}' /proc/cmdline
1st solution: With your shown samples in GNU awk please try following awk code.
awk -v RS='[[:space:]]+root=[^[:space:]]+' '
RT && split(RT,arr,"="){
print arr[2]
}
' Input_file
2nd solution: With GNU grep you could try following solution, using -oP options to enable PCRE regex in grep and in main section of grep using regex ^.*?[[:space:]]root=\K\S+ where \K is used for forgetting matched values till root= and get rest of the values as required.
grep -oP '^.*?[[:space:]]root=\K\S+' Input_file
3rd solution: In case your Input_file is always same as shown samples then try this Simple awk using field separator(s) concept.
awk -F' |root=' '{print $3}' Input_file
If the second field has the value, using awk you can split and check for root
awk '
{
n=split($2,a,"=")
if (n==2 && a[1]=="root"){
print a[2]
}
}
' file
Output
/dev/mapper/vgubuntu-root
Or using GNU-awk with a capture group
awk 'match($0, /(^|\s)root=(\S+)/, a) {print a[2]}' file
Since you are using Linux, you can use a GNU grep:
grep -oP '\broot=\K\S+'
where o allows match output, and P sets the regex engine to PCRE. See the online demo. Details:
\b - word boundary
root= - a fixed string
\K - match reset operator discarding the text matched so far
\S+ - one or more non-whitespace chars.
another awk solution, using good ole' FS / OFS :
-- no PCRE, capture groups, match(), g/sub(), or substr() needed
echo 'BOOT_IMAGE=/vmlinuz-5.15.0-57-generic root=/dev/mapper/vgubuntu-root ro quiet splash vt.handoff=7' |
mawk NF=NF FS='^[^=]+=[^=]+=| [^/]+$' OFS=
/dev/mapper/vgubuntu-root
if you're very very certain the structure has root=, then :
gawk NF=NF FS='^.+root=| .+$' OFS=
/dev/mapper/vgubuntu-root
if you like doing it the RS way instead :
nawk '$!NF = $NF' FS== RS=' [^/]+\n'
/dev/mapper/vgubuntu-root

print the last letter of each word to make a string using `awk` command

I have this line
UDACBG UYAZAM DJSUBU WJKMBC NTCGCH DIDEVO RHWDAS
i am trying to print the last letter of each word to make a string using awk command
awk '{ print substr($1,6) substr($2,6) substr($3,6) substr($4,6) substr($5,6) substr($6,6) }'
In case I don't know how many characters a word contains, what is the correct command to print the last character of $column, and instead of the repeding substr command, how can I use it only once to print specific characters in different columns
If you have just this one single line to handle you can use
awk '{for (i=1;i<=NF;i++) r = r "" substr($i,length($i))} END{print r}' file
If you have multiple lines in the input:
awk '{r=""; for (i=1;i<=NF;i++) r = r "" substr($i,length($i)); print r}' file
Details:
{for (i=1;i<=NF;i++) r = r "" substr($i,length($i)) - iterate over all fields in the current record, i is the field ID, $i is the field value, and all last chars of each field (retrieved with substr($i,length($i))) are appended to r variable
END{print r} prints the r variable once awk script finishes processing.
In the second solution, r value is cleared upon each line processing start, and its value is printed after processing all fields in the current record.
See the online demo:
#!/bin/bash
s='UDACBG UYAZAM DJSUBU WJKMBC NTCGCH DIDEVO RHWDAS'
awk '{for (i=1;i<=NF;i++) r = r "" substr($i,length($1))} END{print r}' <<< "$s"
Output:
GMUCHOS
Using GNU awk and gensub:
$ gawk '{print gensub(/([^ ]+)([^ ])( |$)/,"\\2","g")}' file
Output:
GMUCHOS
1st solution: With GNU awk you could try following awk program, written and tested eith shown samples.
awk -v RS='.([[:space:]]+|$)' 'RT{gsub(/[[:space:]]+/,"",RT);val=val RT} END{print val}' Input_file
Explanation: Set record separator as any character followed by space OR end of value/line. Then as per OP's requirement remove unnecessary newline/spaces from fetched value; keep on creating val which has matched value of RS, finally when awk program is done with reading whole Input_file print the value of variable then.
2nd solution: Using record separator as null and using match function on values to match regex (.[[:space:]]+)|(.$) to get last letter values only with each match found, keep adding matched values into a variable and at last in END block of awk program print variable's value.
awk -v RS= '
{
while(match($0,/(.[[:space:]]+)|(.$)/)){
val=val substr($0,RSTART,RLENGTH)
$0=substr($0,RSTART+RLENGTH)
}
}
END{
gsub(/[[:space:]]+/,"",val)
print val
}
' Input_file
Simple substitutions on individual lines is the job sed exists to do:
$ sed 's/[^ ]*\([^ ]\) */\1/g' file
GMUCHOS
using many tools
$ tr -s ' ' '\n' <file | rev | cut -c1 | paste -sd'\0'
GMUCHOS
separate the words to lines, reverse so that we can pick the first char easily, and finally paste them back together without a delimiter. Not the shortest solution but I think the most trivial one...
I would harness GNU AWK for this as follows, let file.txt content be
UDACBG UYAZAM DJSUBU WJKMBC NTCGCH DIDEVO RHWDAS
then
awk 'BEGIN{FPAT="[[:alpha:]]\\>";OFS=""}{$1=$1;print}' file.txt
output
GMUCHOS
Explanation: Inform AWK to treat any alphabetic character at end of word and use empty string as output field seperator. $1=$1 is used to trigger line rebuilding with usage of specified OFS. If you want to know more about start/end of word read GNU Regexp Operators.
(tested in gawk 4.2.1)
Another solution with GNU awk:
awk '{$0=gensub(/[^[:space:]]*([[:alpha:]])/, "\\1","g"); gsub(/\s/,"")} 1' file
GMUCHOS
gensub() gets here the characters and gsub() removes the spaces between them.
or using patsplit():
awk 'n=patsplit($0, a, /[[:alpha:]]\>/) { for (i in a) printf "%s", a[i]} i==n {print ""}' file
GMUCHOS
An alternate approach with GNU awk is to use FPAT to split by and keep the content:
gawk 'BEGIN{FPAT="\\S\\>"}
{ s=""
for (i=1; i<=NF; i++) s=s $i
print s
}' file
GMUCHOS
Or more tersely and idiomatic:
gawk 'BEGIN{FPAT="\\S\\>";OFS=""}{$1=$1}1' file
GMUCHOS
(Thanks Daweo for this)
You can also use gensub with:
gawk '{print gensub(/\S*(\S\>)\s*/,"\\1","g")}' file
GMUCHOS
The advantage here of both is that single letter "words" are handled properly:
s2='SINGLE X LETTER Z'
gawk 'BEGIN{FPAT="\\S\\>";OFS=""}{$1=$1}1' <<< "$s2"
EXRZ
gawk '{print gensub(/\S*(\S\>)\s*/,"\\1","g")}' <<< "$s2"
EXRZ
Where the accepted answer and most here do not:
awk '{for (i=1;i<=NF;i++) r = r "" substr($i,length($1))} END{print r}' <<< "$s2"
ER # WRONG
gawk '{print gensub(/([^ ]+)([^ ])( |$)/,"\\2","g")}' <<< "$s2"
EX RZ # WRONG

Edited: Grep/Awk- Print specific info from table

(This example is edited, following a user's recommendation, considering a mistake in my table display)
I have a .csv table from where I need certain info. My table looks like this:
Name, Birth
James,2001/02/03 California
Patrick,2001/02/03 Texas
Sarah,2000/03/01 Alabama
Sean,2002/02/01 New York
Michael,2002/02/01 Ontario
From here, I would need to print only the unique birthdates, in an ascending order, like this:
2000/03/01
2001/02/03
2002/02/01
I have thought of a regular expression to identify the dates, such as:
awk '/[0-9]{4}/[0-9]{2}/[0-9]/{2}/' students.csv
However, I'm getting a syntax error in the regex, and I wouldn't know how to follow from this step.
Any hints?
Use cut and sort with -u option to print unique values:
cut -d' ' -f2 students.csv | sort -u > out_file
You can also use grep instead of cut:
grep -Po '\d\d\d\d/\d\d/\d\d' students.csv | sort -u > out_file
Here, GNU grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only (1 match per line), not the entire lines.
SEE ALSO:
perlre - Perl regular expressions
Here is a gnu awk solution to get this done in a single command:
awk 'NF > 2 && !seen[$2]++{} END {
PROCINFO["sorted_in"]="#ind_str_asc"; for (i in seen) print i}' file
2000/03/01
2001/02/03
2002/02/01
Using any awk and whether your names have 1 word or more and whether blank chars exist after the commas or not:
$ awk -F', *' 'NR>1{sub(/ .*/,"",$2); print $2}' file | sort -u
2000/03/01
2001/02/03
2002/02/01
With your shown samples, could you please try following. Written and tested in GNU awk, should work in any awk though.
awk '
match($0,/[0-9]{4}(\/[0-9]{2}){2}/){
arrDate[substr($0,RSTART,RLENGTH)]
}
END{
for(i in arrDate){
print i
}
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/[0-9]{4}(\/[0-9]{2}){2}/){ ##using match function to match regex to match only date format.
arrDate[substr($0,RSTART,RLENGTH)] ##Creating array arrDate which has index as sub string of matched one.
}
END{ ##Starting END block of this awk program from here.
for(i in arrDate){ ##Traversing through arrDate here.
print i ##Printing index of array here.
}
}
' Input_file ##Mentioning Input_file name here.

bash scripting - using sed or awk to split and extract data

I'm having trouble with a specific situation. If I have a file filled with entries like:
my.site.example.com
somelinewithnodot
some.line .with.a.weird.space..this.is
this.one.has , and.stuff*.all.I
&&&83%23^&4,I;dont,even.need.2see
Using bash, how can I use like awk or sed or something to split the data on each line by "." and then only print the entries directly before and directly after the last ".", ignoring lines with no "."?
Desired output:
example.com
somelinewithnodot
this.is
all.I
need.2see
I've been trying to use sed but I'm having trouble setting up the regex. I've done stuff like this before but it's been a minute and I'm having trouble remembering how to properly set it up...
Could you please try following.
awk -F'.' 'NF>1{print $(NF-1) FS $NF;next} 1' Input_file
OR
awk 'BEGIN{FS=OFS="."}NF>1{print $(NF-1) FS $NF;next} 1' Input_file
OR
awk -F'.' 'NF>1{$0=$(NF-1) FS $NF} 1' Input_file
OR
awk 'BEGIN{FS=OFS="."}NF>1{print $(NF-1) FS $NF;next} 1' Input_file
You can use substitution with sed:
sed 's/^\([^.]*\.\)*\([^.]\+\.[^.]\+\)$/\2/'
This might work for you (GNU sed):
sed -E 's/.*[.](.*[.].*)$/\1/' file
Match the last two .'s and replace them by the last . and words either side.
Alternative:
sed 's/.*\.\(.*\..*\)$/\1/' file
You can try Perl also
perl -ne ' /(^[^\.]+$)|(?<=\.)([^\.]+\.[^\.]+$)/g and print "$1$2" '
with Inputs
$ cat johnred.txt
my.site.example.com
somelinewithnodot
some.line .with.a.weird.space..this.is
this.one.has , and.stuff*.all.I
&&&83%23^&4,I;dont,even.need.2see
$ perl -ne ' /(^[^\.]+$)|(?<=\.)([^\.]+\.[^\.]+$)/g and print "$1$2" ' johnred.txt
example.com
somelinewithnodot
this.is
all.I
need.2see
$
. loses its special meaning when used in [ ], so you can use
perl -ne ' /(^[^.]+$)|(?<=\.)([^.]+\.[^.]+$)/g and print "$1$2" ' johnred.txt
Another solution using array operation
perl -lne ' #b=$_=~/([^.]+)/g ; print $b[-2]? "$b[-2].":"", $b[-1] ' johnred.txt

Regex, get what's after the second occurence of a string

I have a string of the following format:
TEXT####TEXT####SPECIALTEXT
I need to get the SPECIALTEXT, basically what is after the second occurrence of the ####. I can't get it done. Thanks
The regex (?:.*?####){2}(.*) contains what you're looking for in its first group.
If you are using shell and can use awk for it:
From a file:
awk 'BEGIN{FS="####"} {print $3}' input_file
From a variable:
awk 'BEGIN{FS="####"} {print $3}' <<< "$input_variable"