regular expression extract string after a colon in bash - regex

I need to extract the string after the : in an example below:
package:project.abc.def
Where i would get project.abc.def as a result.
I am attempting this in bash and i believe i have a regular expression that will work :([^:]*)$.
In my bash script i have package:project.abc.def as a variable called apk. Now how do i assign the same variable the substring found with the regular expression?
Where the result from package:project.abc.def would be in the apk variable. And package:project.abc.def is initially in the apk variable?
Thanks!

There is no need for a regex here, just a simple prefix substitution:
$ apk="package:project.abc.def"
$ apk=${apk##package:}
project.abc.def
The ## syntax is one of bash's parameters expansions. Instead of #, % can be used to trim the end. See this section of the bash man page for the details.
Some alternatives:
$ apk=$(echo $apk | awk -F'package:' '{print $2}')
$ apk=$(echo $apk | sed 's/^package://')
$ apk=$(echo $apk | cut -d':' -f2)

$ string="package:project.abc.def"
$ apk=$(echo $string | sed 's/.*\://')
".*:" matches everything before and including ':' and then its removed from the string.

Capture groups from regular expressions can be found in the BASH_REMATCH array.
[[ $str =~ :([^:]*)$ ]]
# 0 is the substring that matches the entire regex
# n > 1: the nth parenthesized group
apk=${BASH_REMATCH[1]}

Related

In Bash, what is the string replacement pattern to remove any number of leading hyphens?

This strips out any number of the leading hyphens:
§ echo '--nom-nom' | perl -pe 's|^-+||'
nom-nom
What should the replacement pattern look like if I want to use bash string replacement to do the same? This does not work:
§ a=--nom-nom; a="${a/^-+/}"; echo $a
--nom-nom
Replacing all hyphens works, but that is not what I want:
§ a=--nom-nom; a="${a//-/}"; echo $a
nomnom
If shell option extglob is set, you can use an extended pattern
$ shopt -s extglob
$ a=--nom-nom; a="${a##*(-)}"; echo $a
nom-nom
If you don't want to always enable extglob, you can use a subshell to temporarily set it:
$ shopt -u extglob
$ a=--nom-nom; a=$(shopt -s extglob; echo "${a##*(-)}"); echo $a
${var##*(-)} uses the "remove longest matching prefix" replacement. You could also use ${var/#*(-)/}; in this context, the # forces the match to be initial. In both cases, *(pattern) means "nothing or any number of repetitions of 'pattern'", similar to regex syntax except that the * comes first and the parentheses are required.
If you want to use regular expressions, you can use the expr command:
$ expr "$a" : '-*\(.*\)'
nom-nom
Note that this is not a bash built-in. But it is required by Posix. It always uses Posix Basic Regular Expressions, which is why the capture parentheses need to be backslashed. (As noted in the documentation, it is expected that there will be precisely one capture group in the regex.)
You can capture what you want vs eliminate what you don't want with a Bash regex:
$ s='--nom-nom'
$ [[ $s =~ ^-*(.*) ]] && echo ${BASH_REMATCH[1]}
nom-nom

Excluding the first 3 characters of a string using regex

Given any string in bash, e.g flaccid, I want to match all characters in the string but the first 3 (in this case I want to exclude "fla" and match only "ccid"). The regex also needs to work in sed.
I have tried positive look behind and the following regex expressions (as well as various other unsuccessful ones):
^.{3}+([a-z,A-Z]+)
sed -r 's/(?<=^....)(.[A-Z]*)/,/g'
Google hasn't been very helpful as it only produce results like "get first 3 characters .."
Thanks in advance!
If you want to get all characters but the first 3 from a string, you can use cut:
str="flaccid"
cut -c 4- <<< "$str"
or bash variable subsitution:
str="flaccid"
echo "${str:3}"
That will strip the first 3 characters out of your string.
You may just use a capturing group within an expression like ^.{3}(.*) / ^.{3}([a-zA-Z]+) and grab the ${BASH_REMATCH[1]} contents:
#!/bin/bash
text="flaccid"
rx="^.{3}(.*)"
if [[ $text =~ $rx ]]; then
echo ${BASH_REMATCH[1]};
fi
See online Bash demo
In sed, you should also be using capturing groups / backreferences to get what you need. To just keep the first 3 chars, you may use a simple:
echo "flaccid" | sed 's/.\{3\}//'
See this regex demo. The .\{3\} matches exactly any 3 chars and will remove them from the beginning only, since g modifier is not used.
Now, both the solutions above will output ccid, returning the first 3 chars only.
Using sed, just remove them
echo string | sed 's/^...//g'
How is it that no-one has named the most simple and portable solution:
shell "Parameter expansions":
str="flacid"
echo "${str#???}
For a regex (bash):
$ str="flaccid"
$ regex='^.{3}(.*)$'
$ [[ $str =~ $regex ]] && echo "${BASH_REMATCH[1]}"
ccid
Same regex in sed:
$ echo "flaccid" | sed -E "s/$regex/\1/"
ccid
Or sed (Basic Regex):
$ echo "flaccid" | sed 's/^.\{3\}\(.*\)$/\1/'
ccid

Capture group from regex in bash

I have the following string /path/to/my-jar-1.0.jar for which I am trying to write a bash regex to pull out my-jar.
Now I believe the following regex would work: ([^\/]*?)-\d but I don't know how to get bash to run it.
The following: echo '/path/to/my-jar-1.0.jar' | grep -Po '([^\/]*?)-\d' captures my-jar-1
In BASH you can do:
s='/path/to/my-jar-1.0.jar'
[[ $s =~ .*/([^/[:digit:]]+)-[[:digit:]] ]] && echo "${BASH_REMATCH[1]}"
my-jar
Here "${BASH_REMATCH[1]}" will print captured group #1 which is expression inside first (...).
You can do this as well with shell prefix and suffix removal:
$ path=/path/to/my-jar-1.0.jar
# Remove the longest prefix ending with a slash
$ base="${path##*/}"
# Remove the longest suffix starting with a dash followed by a digit
$ base="${base%%-[0-9]*}"
$ echo "$base"
my-jar
Although it's a little annoying to have to do the transform in two steps, it has the advantage of only using Posix features so it will work with any compliant shell.
Note: The order is important, because the basename cannot contain a slash, but a path component could contain a dash. So you need to remove the path components first.
grep -o doesn't recognize "capture groups" I think, just the entire match. That said, with Perl regexps (-P) you have the "lookahead" option to exclude the -\d from the match:
echo '/path/to/my-jar-1.0.jar' | grep -Po '[^/]*(?=-\d)'
Some reference material on lookahead/lookbehind:
http://www.perlmonks.org/?node_id=518444

grep on unix / linux: how to replace or capture text?

So I'm pretty good with regular expressions, but I'm having some trouble with them on unix. Here are two things I'd love to know how to do:
1) Replace all text except letters, numbers, and underscore
In PHP I'd do this: (works great)
preg_replace('#[^a-zA-Z0-9_]#','',$text).
In bash I tried this (with limited success); seems like it dosen't allow you to use the full set of regex:
text="my #1 example!"
${text/[^a-zA-Z0-9_]/'')
I tried it with sed but it still seems to have problems with the full regex set:
echo "my #1 example!" | sed s/[^a-zA-Z0-9\_]//
I'm sure there is a way to do it with grep, too, but it was breaking it into multiple lines when i tried:
echo abc\!\#\#\$\%\^\&\*\(222 | grep -Eos '[a-zA-Z0-9\_]+'
And finally I also tried using expr but it seemed like that had really limited support for extended regex...
2) Capture (multiple) parts of text
In PHP I could just do something like this:
preg_match('#(word1).*(word2)#',$text,$matches);
I'm not sure how that would be possible in *nix...
Part 1
You are almost there with the sed just add the g modifier so that the replacement happen globally, without the g, replacement will happen just once.
$ echo "my #1 example!" | sed s/[^a-zA-Z0-9\_]//g
my1example
$
You did the same mistake with your bash pattern replacement too: not making replacements globally:
$ text="my #1 example!"
# non-global replacement. Only the space is delete.
$ echo ${text/[^a-zA-Z0-9_]/''}
my#1 example!
# global replacement by adding an additional /
$ echo ${text//[^a-zA-Z0-9_]/''}
my1example
Part 2
Capturing works the same in sed as it did in PHP's regex: enclosing the pattern in parenthesis triggers capturing:
# swap foo and bar's number using capturing and back reference.
$ echo 'foo1 bar2' | sed -r 's/foo([0-9]+) bar([0-9]+)/foo\2 bar\1/'
foo2 bar1
$
As an alternative to codaddict's nice answer using sed, you could also use tr for the first part of your question.
echo "my #1 _ example!" | tr -d -C '[[:alnum:]_]'
I've also made use of the [:alnum:] character class, just to show another option.
what do you mean you can't use the regex syntax for bash?
$ text="my #1 example!"
$ echo ${text//[^a-zA-Z0-9_]/}
my1example
you have to use // for more than 1 replacement.
for your 2nd question, with bash 3.2++
$ [[ $text =~ "(my).*(example)" ]]
$ echo ${BASH_REMATCH[1]}
my
$ echo ${BASH_REMATCH[2]}
example

bash: assign grep regex results to array

I am trying to assign a regular expression result to an array inside of a bash script but I am unsure whether that's possible, or if I'm doing it entirely wrong. The below is what I want to happen, however I know my syntax is incorrect:
indexes[4]=$(echo b5f1e7bfc2439c621353d1ce0629fb8b | grep -o '[a-f0-9]\{8\}')
such that:
index[1]=b5f1e7bf
index[2]=c2439c62
index[3]=1353d1ce
index[4]=0629fb8b
Any links, or advice, would be wonderful :)
here
array=( $(echo b5f1e7bfc2439c621353d1ce0629fb8b | grep -o '[a-f0-9]\{8\}') )
$ echo ${array[#]}
b5f1e7bf c2439c62 1353d1ce 0629fb8b
#!/bin/bash
# Bash >= 3.2
hexstring="b5f1e7bfc2439c621353d1ce0629fb8b"
# build a regex to get four groups of eight hex digits
for i in {1..4}
do
regex+='([[:xdigit:]]{8})'
done
[[ $hexstring =~ $regex ]] # match the regex
array=(${BASH_REMATCH[#]}) # copy the match array which is readonly
unset array[0] # so we can eliminate the full match and only use the parenthesized captured matches
for i in "${array[#]}"
do
echo "$i"
done
here's a pure bash way, no external commands needed
#!/bin/bash
declare -a array
s="b5f1e7bfc2439c621353d1ce0629fb8b"
for((i=0;i<=${#s};i+=8))
do
array=(${array[#]} ${s:$i:8})
done
echo ${array[#]}
output
$ ./shell.sh
b5f1e7bf c2439c62 1353d1ce 0629fb8b