Bash extract project name from url github - regex

I have the following code:
url='https://github.com/Project/name-app.git'
echo $url
I have to get this back to me, i.e. the name of the project regardless of the owner of the project.
Result:
name-app

You can use string manipulation in Bash:
url='https://github.com/Project/name-app.git'
url="${url##*/}" # Remove all up to and including last /
url="${url%.*}" # Remove up to the . (including) from the right
echo "$url"
# => name-app
See the online Bash demo.
Another approach with awk:
url='https://github.com/Project/name-app.git'
url=$(awk -F/ '{sub(/\..*/,"",$NF); print $NF}' <<< "$url")
echo "$url"
See this online demo. Here, the field delimiter is set to /, the last field value is taken and all after first . is removed from it (together with the .), and that result is returned.

You can use grep regexp:
url='https://github.com/Project/name-app.git'
echo $url | grep -oP ".*/\K[^\.]*"
You can use bash expansion:
url='https://github.com/Project/name-app.git'
mytemp=${url##*/}
echo ${mytemp%.*}

1st solution: With your shown samples please try following awk code.
echo $url | awk -F'\\.|/' '{print $(NF-1)}'
OR to make sure last value is always ending with .git try a bit tweak in above code:
echo "$url" | awk -F'\\.|/' '$NF=="git"{print $(NF-1)}'
2nd solution: Using sed try following:
echo "$url" | sed 's/.*\///;s/\..*//'

# my prefered solution
url='https://github.com/Project/name-app.git'
basename $url .git
> name-app
# others
sed 's/\(.*\/\)\([^\/]*\)\.\(\w*\)$/\2/g' <<<$url
1.https://github.com/Project/
2.name-app
#
awk -F"[/.]" '{print $(NF-1)}' <<<$url
awk -F"[/.]" '{print $6}' <<<$url
tr '/.' '\n' <<<$url|tail -2|grep -v git

Related

Using grep regex to select to first hyphen

echo "Linux/DEB/mainbinary-0.1.20190424165331-0-armdef.deb" | grep -oE "([^\/]+$)"
This prints just the filename, without the directory structure, but I cannot manage to print just mainbinary from that string. Suggestions?
And a sed alternative to PS.'s great grep -oP
echo "Linux/DEB/mainbinary-0.1.20190424165331-0-armdef.deb" |sed -r 's#^.*/([^-]+).*#\1#'
mainbinary
echo "Linux/DEB/mainbinary-0.1.20190424165331-0-armdef.deb" |grep -oP '.*/\K[^-]+'
mainbinary
This will scan till last / and ignore everything to its left and keep moving until - (excluding)
With any awk in any shell on any UNIX machine:
$ echo "Linux/DEB/mainbinary-0.1.20190424165331-0-armdef.deb" | awk -F'[/-]' '{print $3}'
mainbinary

How to use sed to identify a string in brackets?

I want to find the string in that is placed with in the brackets. How do I use sed to pull the string?
# cat /sys/block/sdb/queue/scheduler
noop anticipatory deadline [cfq]
I'm not getting the exact result
# cat /sys/block/sdb/queue/scheduler | sed 's/\[*\]//'
noop anticipatory deadline [cfq
I'm expecting an output
cfq
It can be easier with grep, if it happens to be changing the position in which the text in between brackets is located:
$ grep -Po '(?<=\[)[^]]*' file
cfq
This is look-behind: whenever you find a string [, start fetching all the characters up to a ].
See another example:
$ cat a
noop anticipatory deadline [cfq]
hello this [is something] we want to [enclose] yeah
$ grep -Po '(?<=\[)[^]]*' a
cfq
is something
enclose
You can also use awk for this, in case it is always in the same position:
$ awk -F[][] '{print $2}' file
cfq
It is setting the field separators as [ and ]. And from that, prints the second one.
And with sed:
$ sed 's/[^[]*\[\([^]]*\).*/\1/g' file
cfq
It is a bit messy, but basically it is looking from the block of text in between [] and prints it back.
I found one possible solution-
cut -d "[" -f2 | cut -d "]" -f1
so the exact solution is
# cat /sys/block/sdb/queue/scheduler | cut -d "[" -f2 | cut -d "]" -f1
Another potential solution is awk:
s='noop anticipatory deadline [cfq]'
awk -F'[][]' '{print $2}' <<< "$s"
cfq
Another way by gnu grep :
grep -Po "\[\K[^]]*" file
with pure shell:
while read line; do [[ "$line" =~ \[([^]]*)\] ]] && echo "${BASH_REMATCH[1]}"; done < file
Another awk
echo 'noop anticipatory deadline [cfq]' | awk '{gsub(/.*\[|\].*/,x)}8'
cfq
perl -lne 'print $1 if(/\[([^\]]*)\]/)'
Tested here

regex to search for a string between two slashes

I have a question in bash shell scripting. I am looking to search a string between two slashes. Slash is a delimiter here.
Lets say the string is /one/two/, I want to be able to just pick up one.
How can i achieve this is in shell scripts? Any pointers are greatly appreciated.
Use the -F flag of awk to set the delimeter to /. Then you can print the first ($2) and second ($3) field from the line.
$ cat /my/file
/one/two/
$ awk -F/ '{print $2}' /my/file
one
$ awk -F/ '{print $3}' /my/file
two
If the string is in a variable, you can pipe it to awk.
#!/bin/bash
var=/one/two/
echo $var | awk -F/ '{print $2}'
echo $var | awk -F/ '{print $3}'
path="/one/two/"
path=${path#/} # Remove leading /
path=${path%%/*} # Remove everything after first /
echo "$path" # Is now "one"
Using a bash regular expression:
$ str="/one/two/"
$ re="/([^/]*)/[^/]*/"
$ [[ $str =~ $re ]] && echo "${BASH_REMATCH[1]}"
one
$
Using cut:
$ str="/one/two/"
$ echo "$str" | cut -d/ -f2
one
$
Convert your string to an array, delimited with / and read the necessary element:
$ str="/one/two/"
$ IFS='/' a=( $str ) echo "${a[1]}"
one
$
And a couple of more
> cut -f 2 -d "/" <<< "/one/two"
one
> awk -F "/" '{print $2}' <<< "/one/two"
one
> oldifs="$IFS"; IFS="/"; var="/one/two/"; set -- $var; echo "$2"; IFS="$oldifs"
one

sed or awk to capture part of url

I am not very experienced with regular expressions and sed/awk scripting.
I have urls that are similar to the following torrent url:
http://torcache.net/torrent/D7249CD9AF321C8578B3A7007ABBDD63B0475EEB.torrent?title=[kickass.to]against.the.ropes.by.carly.fall.epub.torrent
I would like to have sed or awk script extract the text after the title i.e
from the example above just get:
[kickass.to]against.the.ropes.by.carly.fall.epub.torrent
A simple approach with awk: use the = as the field separator:
awk -F"=" '{print $2}'
Thus:
echo "http://torcache.net/torrent/D7249CD9AF321C8578B3A7007ABBDD63B0475EEB.torrent?title=[kickass.to]against.the.ropes.by.carly.fall.epub.torrent" | awk -F"=" '{print $2}'
[kickass.to]against.the.ropes.by.carly.fall.epub.torrent
Just remove everything before the title=: sed 's/.*title=//'
$ echo "http://torcache.net/torrent/D7249CD9AF321C8578B3A7007ABBDD63B0475EEB.torrent?title=[kickass.to]against.the.ropes.by.carly.fall.epub.torrent" | sed 's/.*title=//'
[kickass.to]against.the.ropes.by.carly.fall.epub.torrent
Let's say:
s='http://torcache.net/torrent/D7249CD9AF321C8578B3A7007ABBDD63B0475EEB.torrent?title=[kickass.to]against.the.ropes.by.carly.fall.epub.torrent'
Pure BASH solution:
echo "${s/*title=}"
[kickass.to]against.the.ropes.by.carly.fall.epub.torrent
OR using grep -P:
echo "$s"|grep -oP 'title=\K.*'
[kickass.to]against.the.ropes.by.carly.fall.epub.torrent
By using sed (no need to mention title in the regexp in your example) :
sed 's/.*=//'
An another solution exists with cut, another standard unix tool :
cut -d= -f2

Substitute a regex pattern using awk

I am trying to write a regex expression to replace one or more '+' symbols present in a file with a space. I tried the following:
echo This++++this+++is+not++done | awk '{ sub(/\++/, " "); print }'
This this+++is+not++done
Expected:
This this is not done
Any ideas why this did not work?
Use gsub which does global substitution:
echo This++++this+++is+not++done | awk '{gsub(/\++/," ");}1'
sub function replaces only 1st match, to replace all matches use gsub.
Or the tr command:
echo This++++this+++is+not++done | tr -s '+' ' '
The idiomatic awk solution would be just to translate the input field separator to the output separator:
$ echo This++++this+++is+not++done | awk -F'++' '{$1=$1}1'
This this is not done
Try this
echo "This++++this+++is+not++done" | sed -re 's/(\+)+/ /g'
You could use sed too.
echo This++++this+++is+not++done | sed -e 's/+\{1,\}/ /g'
This matches one or more + and replaces it with a space.
For this case I recommend sed, this is powerful for substitution and has a short syntax.
Solution sed:
echo This++++this+++is+not++done | sed -En 's/\\++/ /gp'
Result:
This this is not done
For awk:
You must use the gsub function for global line substitution (more than one substitution).
The syntax:
gsub(regexp, replacement [, target]).
If the third parameter is ommited then $0 is the target.
Target must a variable or array element. gsub works in target, overwritten target with the replacement.
Solution awk:
echo This++++this+++is+not++done | awk 'gsub(/\\++/," ")
Result:
This this is not done
echo "This++++this+++is+not++done" | sed 's/++*/ /g'
If you have access to node on your computer you can do it by installing rexreplace
npm install -g regreplace
and then run
rexreplace '\++' ' ' myfile.txt
Of if you have more files in a dir data you can do
rexreplace '\++' ' ' data/*.txt