What's the use of +$ in the give command? [duplicate] - regex

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
what is +$ in this command:
[[ $1 =~ ^[0-9]+$ ]]

The + applies to the [0-9] and not the $.
The intended command was:
[[ $1 =~ ^[0-9]+$ ]]
It checks if $1 only contains digits, e.g. 123 or 9 (but not 123f or foo or empty string).
It breaks down as:
[[, start of a Bash extended test command
$1, the first parameter
=~, the Bash extended test command regex match operator
^[0-9]+$, the regex to match against:
^, anchor matching the start of the line
[0-9]+, one or more digits
[0-9], a digit
+, one or more of the preceding atom
$, anchor matching the end of the line
]] to terminate the test command

+ in regexp matches for "1 or more times the preceding pattern" and $ signifies the end of string anchor.
^ is beginning of string anchor (the natural complement to $), and [0-9] matches any single digit (in the range of 0 to 9).

Related

Testing string for repeated alphanumeric characters in bash with regex [duplicate]

This question already has answers here:
How to match repeated characters using regular expression operator =~ in bash?
(3 answers)
Closed 4 years ago.
I have a string "AbCdEfGG" and I need to test if there are repeated alphanumeric using regex in bash. This is the code I am using right now.
# Check if the password contains a repeated alphanumeric character
if [[ "$password_to_test" =~ ([a-zA-Z0-9])\1{2,} ]]; then
let score=score-10
echo "Password contains a repeated alphanumeric character (-10 points)"
else
echo "Password does not contain a repeated alphanumeric character"
fi
But it never decrements 10 from the score. I need help with the regex pattern here.
BASH regex doesn't support back-reference on all the platforms as it depends on underlying system's regex library ERE implementation (Thanks to # BenjaminW).
You may use this grep:
str='AbCdEfGG'
if grep -Eq '([[:alnum:]])\1' <<< "$str"; then
((score -= 10))
echo "Password contains a repeated alphanumeric character (-10 points)"
else
echo "Password does not contain a repeated alphanumeric character"
fi
It is better to use POSIC bracket expression [[:alnum:]] instead of [a-zA-Z0-9]

Substring of string matching regex in a bash shell [duplicate]

This question already has answers here:
How to check if a string contains a substring in Bash
(29 answers)
Closed 4 years ago.
In a bash shell, I want to take the take a given string that matches a regex, and then take the part of the string.
For example, given https://github.com/PatrickConway/repo-name.git, I want to extract the repo-name substring.
How would I go about doing this? Should I do this all in a shell script, or is there another way to approach this?
You can use the =~ matching operator inside a [[ ... ]] condition:
#!/bin/bash
url=https://github.com/PatrickConway/repo-name.git
if [[ $url =~ ([^/]*)\.git ]] ; then
echo "${BASH_REMATCH[1]}"
fi
Each part enclosed in parentheses creates a capture group, the corresponding matching substring can be found in the same position in the BASH_REMATCH array.
[...] defines a character class
[/] matches a character class consisting of a single character, a slash
^ negates a character class, [^/] matches anything but a slash
* means "zero or more times"
\. matches a dot, as . without a backslash matches any character
So, it reads: remember a substring of non-slashes, followed by a dot and "git".
Or maybe a simple parameter expansion:
#!/bin/bash
url=https://github.com/PatrickConway/repo-name.git
url_without_extension=${url%.git}
name=${url_without_extension##*/}
echo $name
% removes from the right, # removes from the left, doubling the symbol makes the matching greedy, i.e. wildcards try to match as much as possible.
Here's a bashy way of doing it:
var="https://github.com/PatrickConway/repo-name.git"
basevar=${var##*/}
echo ${basevar%.*}
...which gives repo-name

Regular Expression to follow a specific pattern

I'm trying to make sure the input to my shell script follows the format Name_Major_Minor.extension
where Name is any number of digits/characters/"-" followed by "_"
Major is any number of digits followed by "_"
Minor is any number of digits followed by "."
and Extension is any number of characters followed by the end of the file name.
I'm fairly certain my regular expression is just messed up slightly. any file I currently run through it evaluates to "yes" but if I add "[A-Z]$" instead of "*$" it always evaluates to "no". Regular expressions confuse the hell out of me as you can probably tell..
if echo $1 | egrep -q [A-Z0-9-]+_[0-9]+_[0-9]+\.*$
then
echo "yes"
else
echo "nope"
exit
fi
edit: realized I am missing the pattern for "minor". Still doesn't work after adding it though.
Use =~ operator
Bash supports regular expression matching through its =~ operator, and there is no need for egrep in this particular case:
if [[ "$1" =~ ^[A-Za-z0-9-]+_[0-9]+_[0-9]+\..*$ ]]
Errors in your regular expression
The \.*$ sequence in your regular expression means "zero or more dots". You probably meant "a dot and some characters after it", i.e. \..*$.
Your regular expression matches only the end of the string ($). You likely want to match the whole string. To match the entire string, use the ^ anchor to match the beginning of the line.
Escape the command line arguments
If you still want to use egrep, you should escape its arguments as you should escape any command line arguments to avoid reinterpretation of special characters, or rather wrap the argument in single, or double quotes, e.g.:
if echo "$1" | egrep -q '^[A-Za-z0-9-]+_[0-9]+_[0-9]+\..*$'
Use printf instead of echo
Don't use echo, as its behavior is considered unreliable. Use printf instead:
printf '%s\n' "$1"
Try this regex instead: ^[A-Za-z0-9-]+(?:_[0-9]+){2}\..+$.
[A-Za-z0-9-]+ matches Name
_[0-9]+ matches _ followed by one or more digits
(?:...){2} matches the group two times: _Major_Minor
\..+ matches a period followed by one or more character
The problem in your regex seems to be at the end with \.*, which matches a period \. any number of times, see here. Also the [A-Z0-9-] will only match uppercase letters, might not be what you wanted.

Why do these two regexes behave differently?

Why do the following two regexes behave differently?
$millisec = "1391613310.1";
$millisec =~ s/.*(\.\d+)?$/$1/;
vs.
$millisec =~ s/\d*(\.\d+)?$/$1/;
This code prints nothing:
perl -e 'my $mtime = "1391613310.1"; my $millisec = $mtime; $millisec =~ s/.*(\.\d+)?$/$1/; print "$millisec";'
While this prints the decimal portion of the string:
perl -e 'my $mtime = "1391613310.1"; my $millisec = $mtime; $millisec =~ s/\d*(\.\d+)?$/$1/; print "$millisec";'
In the first regex, the .* is taking up everything to the end of the string, so there's nothing the optional (.\d+)? can pick up. $1 will be empty, so the string is replaced by an empty string.
In the second regex, only digits are grabbed from the beginning so that \d* stops in front of the dot. (.\d+)? will pick the dot, including the trailing digits.
You're using .\d+ inside parentheses, which will match any character plus digits. If you want to match a dot explicitly, you have to use \..
To make the first regex behave similarly to the second one you would have to write
$millisec =~ s/.*?(\.\d+)?$/$1/;
so that the initial .* doesn't take up everything.
Greed.
Perl's regex engine will match as much as possible with each term before moving on to the next term. So for .*(.\d+)?$ the .* matches the entire string, then (.\d)? matches nothing as it is optional.
\d*(.\d+)?$ can match only up to the dot, so then has to match .1 against (.\d+)?

How Regex engine parse anchors [duplicate]

This question already has answers here:
Short example of regular expression converted to a state machine?
(6 answers)
Closed 8 years ago.
Can some explain how Regex engine works when it tries match
^4$ to 749\n486\n4
I am mean how Regex engine parse string While performing match
The regexp ^4$ means match a line that only contains a digit 4
If you apply this regexp to a string that contains newline characters then it will treat the first character of the string as the start of the line and the first newline as the end of the line. Additional characters after the newline are effectively ignored. Example in perl
DB<1> $str="749\n486\n4";
DB<2> x $str =~ /^4$/
empty array
example in python
>>> import re
>>> s="749\n486\n4"
>>> re.search('^4$',s)
However, regexp implementations have a way of dealing with this. There is a multiline setting. In perl
DB<3> x $str =~ /^4$/m
0 1
In python
>>> re.search('^4$',s,re.MULTILINE)
<_sre.SRE_Match object at 0x7f446874b030>
The python docs explain multiline mode like this
re.MULTILINE
When specified, the pattern character '^' matches at the beginning of the string and at
the beginning of each line (immediately following
each newline); and the pattern character '$' matches at the end of the
string and at the end of each line (immediately preceding each
newline). By default, '^' matches only at the beginning of the string,
and '$' only at the end of the string and immediately before the
newline (if any) at the end of the string.
If in your multiline string you actually wanted to know if it ended in a digit 4 on a single line then there is a syntax feature for this
DB<4> x $str =~ /^4\z/m
0 1
See http://perldoc.perl.org/perlre.html especially on the m flag and \a, \z, \Z
or http://docs.python.org/2/library/re.html#regular-expression-objects