regex to search for a string between two slashes - regex

I have a question in bash shell scripting. I am looking to search a string between two slashes. Slash is a delimiter here.
Lets say the string is /one/two/, I want to be able to just pick up one.
How can i achieve this is in shell scripts? Any pointers are greatly appreciated.

Use the -F flag of awk to set the delimeter to /. Then you can print the first ($2) and second ($3) field from the line.
$ cat /my/file
/one/two/
$ awk -F/ '{print $2}' /my/file
one
$ awk -F/ '{print $3}' /my/file
two
If the string is in a variable, you can pipe it to awk.
#!/bin/bash
var=/one/two/
echo $var | awk -F/ '{print $2}'
echo $var | awk -F/ '{print $3}'

path="/one/two/"
path=${path#/} # Remove leading /
path=${path%%/*} # Remove everything after first /
echo "$path" # Is now "one"

Using a bash regular expression:
$ str="/one/two/"
$ re="/([^/]*)/[^/]*/"
$ [[ $str =~ $re ]] && echo "${BASH_REMATCH[1]}"
one
$

Using cut:
$ str="/one/two/"
$ echo "$str" | cut -d/ -f2
one
$

Convert your string to an array, delimited with / and read the necessary element:
$ str="/one/two/"
$ IFS='/' a=( $str ) echo "${a[1]}"
one
$

And a couple of more
> cut -f 2 -d "/" <<< "/one/two"
one
> awk -F "/" '{print $2}' <<< "/one/two"
one
> oldifs="$IFS"; IFS="/"; var="/one/two/"; set -- $var; echo "$2"; IFS="$oldifs"
one

Related

Bash extract project name from url github

I have the following code:
url='https://github.com/Project/name-app.git'
echo $url
I have to get this back to me, i.e. the name of the project regardless of the owner of the project.
Result:
name-app
You can use string manipulation in Bash:
url='https://github.com/Project/name-app.git'
url="${url##*/}" # Remove all up to and including last /
url="${url%.*}" # Remove up to the . (including) from the right
echo "$url"
# => name-app
See the online Bash demo.
Another approach with awk:
url='https://github.com/Project/name-app.git'
url=$(awk -F/ '{sub(/\..*/,"",$NF); print $NF}' <<< "$url")
echo "$url"
See this online demo. Here, the field delimiter is set to /, the last field value is taken and all after first . is removed from it (together with the .), and that result is returned.
You can use grep regexp:
url='https://github.com/Project/name-app.git'
echo $url | grep -oP ".*/\K[^\.]*"
You can use bash expansion:
url='https://github.com/Project/name-app.git'
mytemp=${url##*/}
echo ${mytemp%.*}
1st solution: With your shown samples please try following awk code.
echo $url | awk -F'\\.|/' '{print $(NF-1)}'
OR to make sure last value is always ending with .git try a bit tweak in above code:
echo "$url" | awk -F'\\.|/' '$NF=="git"{print $(NF-1)}'
2nd solution: Using sed try following:
echo "$url" | sed 's/.*\///;s/\..*//'
# my prefered solution
url='https://github.com/Project/name-app.git'
basename $url .git
> name-app
# others
sed 's/\(.*\/\)\([^\/]*\)\.\(\w*\)$/\2/g' <<<$url
1.https://github.com/Project/
2.name-app
#
awk -F"[/.]" '{print $(NF-1)}' <<<$url
awk -F"[/.]" '{print $6}' <<<$url
tr '/.' '\n' <<<$url|tail -2|grep -v git

Using grep regex to select to first hyphen

echo "Linux/DEB/mainbinary-0.1.20190424165331-0-armdef.deb" | grep -oE "([^\/]+$)"
This prints just the filename, without the directory structure, but I cannot manage to print just mainbinary from that string. Suggestions?
And a sed alternative to PS.'s great grep -oP
echo "Linux/DEB/mainbinary-0.1.20190424165331-0-armdef.deb" |sed -r 's#^.*/([^-]+).*#\1#'
mainbinary
echo "Linux/DEB/mainbinary-0.1.20190424165331-0-armdef.deb" |grep -oP '.*/\K[^-]+'
mainbinary
This will scan till last / and ignore everything to its left and keep moving until - (excluding)
With any awk in any shell on any UNIX machine:
$ echo "Linux/DEB/mainbinary-0.1.20190424165331-0-armdef.deb" | awk -F'[/-]' '{print $3}'
mainbinary

How can I extract the timestamp from the end of a shell variable when the format isn't fixed?

I'm trying to extract the timestamp from the end of a shell variable like this:
Input=AEXP_CSTONE_EU_prpbdp_sourcefile_yyyymmddhhmmss.txt
TimeStamp=`echo $Input | awk -F"_" '{print $6}'`
This works for this particular case, but the format of the string can change. For example, it could also be:
Input=AEXP_CSTONE_EU_prpbdp_sourcefile_prospects_yyyymmddhhmmss.txt
The variable will always end with yyyymmddhhmmss.txt. How can I extract the timestamp consistently?
Given:
$ echo $Input
AEXP_CSTONE_EU_prpbdp_sourcefile_prospects_20151116141111.txt
You can use sed:
$ echo $Input | sed -n 's|.*_\([0-9]\{14\}\)\.txt|\1|p'
20151116141111
Or nested grep:
$ echo $Input | grep -Eo '_[0-9]{14}\.txt' | grep -Eo '[0-9]{14}'
20151116141111
awk:
$ echo $Input | awk -F_ '{split($NF, a, "."); print a[1]}'
20151116141111
Perl
$ echo $Input | perl -ne 'print $1 if /_(\d{14})\.txt/'
20151116141111
cut and rev:
$ echo $Input | rev | cut -d'_' -f 1 | rev | cut -d'.' -f 1
20151116141111
Bash:
$ last=${Input##*_}
$ echo $last
20151116141111.txt
$ ts=${last%.*}
$ echo $ts
20151116141111
In summary, lots of ways...
If you don't want to loose the .txt part, even easier:
$ echo $Input | sed -n 's|.*_\([0-9]\{14\}\.txt\)|\1|p'
20151116141111.txt
$ echo $Input | grep -Eo '[0-9]{14}\.txt$'
20151116141111.txt
$ echo $Input | awk -F_ '{print $NF}'
20151116141111.txt
$ echo $Input | perl -ne 'print $1 if /_(\d{14}\.txt)/'
20151116141111.txt
$ echo $Input | rev | cut -d'_' -f 1 | rev
20151116141111.txt
$ last=${Input##*_}
$ echo $last
20151116141111.txt
You need to match the part that will not change then:
TimeStamp=$(echo $Input | perl -pe 's/.*(\d{14})\.txt/$1/')
You are extracting the 6th field separated by _, yet it seems you really want to extract the last field. You can do that with parameter expansion:
timestamp=${Input##*_}
timestamp=${timestamp%.txt}
See BashFAQ 100 for more on string manipulation in bash.
In awk, you'd use $NF to reference the last field, though awk is overkill for this.

Search regex on a specific field using awk

In awk I can search a field for a value like:
$ echo -e "aa,bb,cc\ndd,eaae,ff" | awk 'BEGIN{FS=",";}; $2=="eaae" {print $0};'
aa,bb,cc
dd,eaae,ff
And I can search by regular expressions like
$ echo -e "aa,bb,cc\ndd,eaae,ff" | awk 'BEGIN{FS=",";}; /[a]{2}/ {print $0};'
aa,bb,cc
dd,eaae,ff
Can I force the awk to apply the regexp search to a specific field ? I'm looking for something like
$ echo -e "aa,bb,cc\ndd,eaae,ff" | awk 'BEGIN{FS=",";}; $2==/[a]{2}/ {print $0};'
expecting result:
dd,eaae,ff
Anyone know how to do it using awk?
Accepted response - Operator "~" (thanks to hek2mgl):
$ echo -e "aa,bb,cc\ndd,eaae,ff" | awk 'BEGIN{FS=",";}; $2 ~ /[a]{2}/ {print $0};'
You can use :
$2 ~ /REGEX/ {ACTION}
If the regex should apply to the second field (for example) only.
In your case this would lead to:
awk -F, '$2 ~ /^[a]{2}$/' <<< "aa,bb,cc\ndd,eaae,ff"
You may wonder why I've just used the regex in the awk program and no print. This is because your action is print $0 - printing the current line - which is the default action in awk.

How to use sed to identify a string in brackets?

I want to find the string in that is placed with in the brackets. How do I use sed to pull the string?
# cat /sys/block/sdb/queue/scheduler
noop anticipatory deadline [cfq]
I'm not getting the exact result
# cat /sys/block/sdb/queue/scheduler | sed 's/\[*\]//'
noop anticipatory deadline [cfq
I'm expecting an output
cfq
It can be easier with grep, if it happens to be changing the position in which the text in between brackets is located:
$ grep -Po '(?<=\[)[^]]*' file
cfq
This is look-behind: whenever you find a string [, start fetching all the characters up to a ].
See another example:
$ cat a
noop anticipatory deadline [cfq]
hello this [is something] we want to [enclose] yeah
$ grep -Po '(?<=\[)[^]]*' a
cfq
is something
enclose
You can also use awk for this, in case it is always in the same position:
$ awk -F[][] '{print $2}' file
cfq
It is setting the field separators as [ and ]. And from that, prints the second one.
And with sed:
$ sed 's/[^[]*\[\([^]]*\).*/\1/g' file
cfq
It is a bit messy, but basically it is looking from the block of text in between [] and prints it back.
I found one possible solution-
cut -d "[" -f2 | cut -d "]" -f1
so the exact solution is
# cat /sys/block/sdb/queue/scheduler | cut -d "[" -f2 | cut -d "]" -f1
Another potential solution is awk:
s='noop anticipatory deadline [cfq]'
awk -F'[][]' '{print $2}' <<< "$s"
cfq
Another way by gnu grep :
grep -Po "\[\K[^]]*" file
with pure shell:
while read line; do [[ "$line" =~ \[([^]]*)\] ]] && echo "${BASH_REMATCH[1]}"; done < file
Another awk
echo 'noop anticipatory deadline [cfq]' | awk '{gsub(/.*\[|\].*/,x)}8'
cfq
perl -lne 'print $1 if(/\[([^\]]*)\]/)'
Tested here