This question already has answers here:
How to check if a string contains a substring in Bash
(29 answers)
Closed 4 years ago.
In a bash shell, I want to take the take a given string that matches a regex, and then take the part of the string.
For example, given https://github.com/PatrickConway/repo-name.git, I want to extract the repo-name substring.
How would I go about doing this? Should I do this all in a shell script, or is there another way to approach this?
You can use the =~ matching operator inside a [[ ... ]] condition:
#!/bin/bash
url=https://github.com/PatrickConway/repo-name.git
if [[ $url =~ ([^/]*)\.git ]] ; then
echo "${BASH_REMATCH[1]}"
fi
Each part enclosed in parentheses creates a capture group, the corresponding matching substring can be found in the same position in the BASH_REMATCH array.
[...] defines a character class
[/] matches a character class consisting of a single character, a slash
^ negates a character class, [^/] matches anything but a slash
* means "zero or more times"
\. matches a dot, as . without a backslash matches any character
So, it reads: remember a substring of non-slashes, followed by a dot and "git".
Or maybe a simple parameter expansion:
#!/bin/bash
url=https://github.com/PatrickConway/repo-name.git
url_without_extension=${url%.git}
name=${url_without_extension##*/}
echo $name
% removes from the right, # removes from the left, doubling the symbol makes the matching greedy, i.e. wildcards try to match as much as possible.
Here's a bashy way of doing it:
var="https://github.com/PatrickConway/repo-name.git"
basevar=${var##*/}
echo ${basevar%.*}
...which gives repo-name
Related
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
what is +$ in this command:
[[ $1 =~ ^[0-9]+$ ]]
The + applies to the [0-9] and not the $.
The intended command was:
[[ $1 =~ ^[0-9]+$ ]]
It checks if $1 only contains digits, e.g. 123 or 9 (but not 123f or foo or empty string).
It breaks down as:
[[, start of a Bash extended test command
$1, the first parameter
=~, the Bash extended test command regex match operator
^[0-9]+$, the regex to match against:
^, anchor matching the start of the line
[0-9]+, one or more digits
[0-9], a digit
+, one or more of the preceding atom
$, anchor matching the end of the line
]] to terminate the test command
+ in regexp matches for "1 or more times the preceding pattern" and $ signifies the end of string anchor.
^ is beginning of string anchor (the natural complement to $), and [0-9] matches any single digit (in the range of 0 to 9).
$cmd =~ s#-fp [^ ]+##;
Is there anyone who let me know what this regex means in Perl?
I couldn't find any regex like above through googling...
This removes the -fp optional parameter and its value from the command.
This takes the string stored by variable $cmd and replaces a section matching -fp [^ ]+ with nothing.
This command is employing the fact that Perl subsitution (or other regex modifiers) can have any delimiter character. What is normally written as s/.../.../ is s#...#...# here. That may be the source of confusion.
=~ is a binary binding operator which takes the left argument as the string to perform the right argument argument on, in this case a substitution.
-fp [^ ]+
-fp matches literally.
[^ ]+ matches one or more characters which are not space.
Let's get the easy bit out of the way first. The $cmd =~ simply means "do the substitution on the variable $cmd".
Not all of this expression is a regex. It's actually the substitution operator - s/REGEX/STRING/. It matches the REGEX and replaces it with the STRING.
Like many similar operators in Perl, the substitution operator allows you to choose the delimiter character that you use. In this case, the programmer has made the slightly bizarre choice to use #.
So, we have this:
$cmd =~ s/-fp [^ ]+//;
And we now know that it means. "Match the variable $cmd against the regex -fp [^ ]+ and replace it with an empty string". Why an empty string? Because the replacement string bit (between the second and third /) is an empty string.
All we need to do now is to understand the actual regex - -fp [^ ]+. And it's not very complicated.
-fp - the first four characters (up to and including the space) match themselves. So this matches the literal string "-fp ".
[^ ] - this is a "character class". Normally, it means "match any of the characters inside [...]". But the ^ at the start inverts that meaning to "match any characters expect the ones between [^...]. So this is match anything that isn't a space.
+ - this is a modifier that means "match one or more of the previous expression".
So, put together, this is "match the string '-fp ' followed by one or more non-space characters.
And, adding in the rest of the expression, we get:
Look at the string in $cmd, if you find the string '-fp -' followed by one or more non-space characters, then replace the matched portion with an empty string.
I'm trying to make sure the input to my shell script follows the format Name_Major_Minor.extension
where Name is any number of digits/characters/"-" followed by "_"
Major is any number of digits followed by "_"
Minor is any number of digits followed by "."
and Extension is any number of characters followed by the end of the file name.
I'm fairly certain my regular expression is just messed up slightly. any file I currently run through it evaluates to "yes" but if I add "[A-Z]$" instead of "*$" it always evaluates to "no". Regular expressions confuse the hell out of me as you can probably tell..
if echo $1 | egrep -q [A-Z0-9-]+_[0-9]+_[0-9]+\.*$
then
echo "yes"
else
echo "nope"
exit
fi
edit: realized I am missing the pattern for "minor". Still doesn't work after adding it though.
Use =~ operator
Bash supports regular expression matching through its =~ operator, and there is no need for egrep in this particular case:
if [[ "$1" =~ ^[A-Za-z0-9-]+_[0-9]+_[0-9]+\..*$ ]]
Errors in your regular expression
The \.*$ sequence in your regular expression means "zero or more dots". You probably meant "a dot and some characters after it", i.e. \..*$.
Your regular expression matches only the end of the string ($). You likely want to match the whole string. To match the entire string, use the ^ anchor to match the beginning of the line.
Escape the command line arguments
If you still want to use egrep, you should escape its arguments as you should escape any command line arguments to avoid reinterpretation of special characters, or rather wrap the argument in single, or double quotes, e.g.:
if echo "$1" | egrep -q '^[A-Za-z0-9-]+_[0-9]+_[0-9]+\..*$'
Use printf instead of echo
Don't use echo, as its behavior is considered unreliable. Use printf instead:
printf '%s\n' "$1"
Try this regex instead: ^[A-Za-z0-9-]+(?:_[0-9]+){2}\..+$.
[A-Za-z0-9-]+ matches Name
_[0-9]+ matches _ followed by one or more digits
(?:...){2} matches the group two times: _Major_Minor
\..+ matches a period followed by one or more character
The problem in your regex seems to be at the end with \.*, which matches a period \. any number of times, see here. Also the [A-Z0-9-] will only match uppercase letters, might not be what you wanted.
I'm working in bash, chosen mainly so I could get some practice with it, and I have a string that I know matches the regex [:blank:]+([0-9]+)[:blank:]+([0-9]+)[:blank:]+$SOMETHING, assuming I got that right. (Whitespace, digits, whitespace, digits, whitespace, some string I've previously defined.) By "matches," I mean it includes this format as a substring.
Is there a way to set the two strings of digits to specific variables with just one regex matching?
$BASH_REMATCH contains the groups from the latest regex comparison done by [[.
$ [[ ' 123 456 ' =~ [[:blank:]]+([0-9]+)[[:blank:]]+([0-9]+)[[:blank:]]+ ]] && echo "*${BASH_REMATCH[1]}*${BASH_REMATCH[2]}*"
*123*456*
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How can I escape meta-characters when I interpolate a variable in Perl's match operator?
I am using the following regex to search for a string $word in the bigger string $referenceLine as follows :
$wordRefMatchCount =()= $referenceLine =~ /(?=\b$word\b)/g
The problem happens when my $word substring contains some (, etc. Because it takes it as a part of the regex rather than the string to match and gives the following error :
Unmatched ( in regex; marked by <-- HERE in
m/( <-- HERE ?=\b( darsheel safary\b)/
at ./bleu.pl line 119, <REFERENCE> line 1.
Can somone please tell me a solution to this? I think If I could somehow get perl to understand that we want to look for the whole $word as it is without evaluating it, it might work out.
Use
$wordRefMatchCount =()= $referenceLine =~ /(?=\b\Q$word\E\b)/g
to tell the regex engine to treat every character in $word as a literal character.
\Q marks the start, \E marks the end of a literal string in Perl regex.
Alternatively, you could do
$quote_word = quotemeta($word);
and then use
$wordRefMatchCount =()= $referenceLine =~ /(?=\b$quote_word\b)/g
One more thing (taken up here from the comments where it's harder to find:
Your regex fails in your example case because of the word boundary anchor \b. This anchor matches between a word character and a non-word character. It only makes sense if placed around actual words, i. e. \bbar\b to ensure that only bar is matched, not foobar or barbaric. If you put it around non-words (as in \b( darsheel safary\b) then it will cause the match to fail (unless there is a letter, digit or underscore right before the ().