Testing string for repeated alphanumeric characters in bash with regex [duplicate] - regex

This question already has answers here:
How to match repeated characters using regular expression operator =~ in bash?
(3 answers)
Closed 4 years ago.
I have a string "AbCdEfGG" and I need to test if there are repeated alphanumeric using regex in bash. This is the code I am using right now.
# Check if the password contains a repeated alphanumeric character
if [[ "$password_to_test" =~ ([a-zA-Z0-9])\1{2,} ]]; then
let score=score-10
echo "Password contains a repeated alphanumeric character (-10 points)"
else
echo "Password does not contain a repeated alphanumeric character"
fi
But it never decrements 10 from the score. I need help with the regex pattern here.

BASH regex doesn't support back-reference on all the platforms as it depends on underlying system's regex library ERE implementation (Thanks to # BenjaminW).
You may use this grep:
str='AbCdEfGG'
if grep -Eq '([[:alnum:]])\1' <<< "$str"; then
((score -= 10))
echo "Password contains a repeated alphanumeric character (-10 points)"
else
echo "Password does not contain a repeated alphanumeric character"
fi
It is better to use POSIC bracket expression [[:alnum:]] instead of [a-zA-Z0-9]

Related

What's the use of +$ in the give command? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
what is +$ in this command:
[[ $1 =~ ^[0-9]+$ ]]
The + applies to the [0-9] and not the $.
The intended command was:
[[ $1 =~ ^[0-9]+$ ]]
It checks if $1 only contains digits, e.g. 123 or 9 (but not 123f or foo or empty string).
It breaks down as:
[[, start of a Bash extended test command
$1, the first parameter
=~, the Bash extended test command regex match operator
^[0-9]+$, the regex to match against:
^, anchor matching the start of the line
[0-9]+, one or more digits
[0-9], a digit
+, one or more of the preceding atom
$, anchor matching the end of the line
]] to terminate the test command
+ in regexp matches for "1 or more times the preceding pattern" and $ signifies the end of string anchor.
^ is beginning of string anchor (the natural complement to $), and [0-9] matches any single digit (in the range of 0 to 9).

Substring of string matching regex in a bash shell [duplicate]

This question already has answers here:
How to check if a string contains a substring in Bash
(29 answers)
Closed 4 years ago.
In a bash shell, I want to take the take a given string that matches a regex, and then take the part of the string.
For example, given https://github.com/PatrickConway/repo-name.git, I want to extract the repo-name substring.
How would I go about doing this? Should I do this all in a shell script, or is there another way to approach this?
You can use the =~ matching operator inside a [[ ... ]] condition:
#!/bin/bash
url=https://github.com/PatrickConway/repo-name.git
if [[ $url =~ ([^/]*)\.git ]] ; then
echo "${BASH_REMATCH[1]}"
fi
Each part enclosed in parentheses creates a capture group, the corresponding matching substring can be found in the same position in the BASH_REMATCH array.
[...] defines a character class
[/] matches a character class consisting of a single character, a slash
^ negates a character class, [^/] matches anything but a slash
* means "zero or more times"
\. matches a dot, as . without a backslash matches any character
So, it reads: remember a substring of non-slashes, followed by a dot and "git".
Or maybe a simple parameter expansion:
#!/bin/bash
url=https://github.com/PatrickConway/repo-name.git
url_without_extension=${url%.git}
name=${url_without_extension##*/}
echo $name
% removes from the right, # removes from the left, doubling the symbol makes the matching greedy, i.e. wildcards try to match as much as possible.
Here's a bashy way of doing it:
var="https://github.com/PatrickConway/repo-name.git"
basevar=${var##*/}
echo ${basevar%.*}
...which gives repo-name

Regular Expression to follow a specific pattern

I'm trying to make sure the input to my shell script follows the format Name_Major_Minor.extension
where Name is any number of digits/characters/"-" followed by "_"
Major is any number of digits followed by "_"
Minor is any number of digits followed by "."
and Extension is any number of characters followed by the end of the file name.
I'm fairly certain my regular expression is just messed up slightly. any file I currently run through it evaluates to "yes" but if I add "[A-Z]$" instead of "*$" it always evaluates to "no". Regular expressions confuse the hell out of me as you can probably tell..
if echo $1 | egrep -q [A-Z0-9-]+_[0-9]+_[0-9]+\.*$
then
echo "yes"
else
echo "nope"
exit
fi
edit: realized I am missing the pattern for "minor". Still doesn't work after adding it though.
Use =~ operator
Bash supports regular expression matching through its =~ operator, and there is no need for egrep in this particular case:
if [[ "$1" =~ ^[A-Za-z0-9-]+_[0-9]+_[0-9]+\..*$ ]]
Errors in your regular expression
The \.*$ sequence in your regular expression means "zero or more dots". You probably meant "a dot and some characters after it", i.e. \..*$.
Your regular expression matches only the end of the string ($). You likely want to match the whole string. To match the entire string, use the ^ anchor to match the beginning of the line.
Escape the command line arguments
If you still want to use egrep, you should escape its arguments as you should escape any command line arguments to avoid reinterpretation of special characters, or rather wrap the argument in single, or double quotes, e.g.:
if echo "$1" | egrep -q '^[A-Za-z0-9-]+_[0-9]+_[0-9]+\..*$'
Use printf instead of echo
Don't use echo, as its behavior is considered unreliable. Use printf instead:
printf '%s\n' "$1"
Try this regex instead: ^[A-Za-z0-9-]+(?:_[0-9]+){2}\..+$.
[A-Za-z0-9-]+ matches Name
_[0-9]+ matches _ followed by one or more digits
(?:...){2} matches the group two times: _Major_Minor
\..+ matches a period followed by one or more character
The problem in your regex seems to be at the end with \.*, which matches a period \. any number of times, see here. Also the [A-Z0-9-] will only match uppercase letters, might not be what you wanted.

Only allow some characters with grep?

I would like to check a string, so it only contains the characters 0-9 a-z -.
When I do
regex='[-a-z0-9]*'
string='abcd!'
if [[ $string =~ $regex ]]
then
echo "valid"
else
echo "not valid"
fi
it outputs valid, where I would have expected not valid because $string contains a !.
try this: regex='^[-a-z0-9]*$'. It will force the complete line to match this class. Otherwise, only a single match, or no match at all (due to *) will return valid. ^...$ says the string starts and ends without anything that fails to match.
You will have to add boundaries for this regex to work.
'[-a-z0-9]*' says: match these characters 0 or more times anywhere in the string.
So adding start and end of line characters to the regex will do what you are looking for:
regex='^[-a-z0-9]*$'
The next step is to limit the number of occurrences of the '-' to only once. Can the dash charcter occur at the start or at the end of the string? If not try:
regex='^[a-z0-9]*-?[a-z0-9]*$'
Hope this helps.

How can I extract specific parts of a string matching a specific regex in bash?

I'm working in bash, chosen mainly so I could get some practice with it, and I have a string that I know matches the regex [:blank:]+([0-9]+)[:blank:]+([0-9]+)[:blank:]+$SOMETHING, assuming I got that right. (Whitespace, digits, whitespace, digits, whitespace, some string I've previously defined.) By "matches," I mean it includes this format as a substring.
Is there a way to set the two strings of digits to specific variables with just one regex matching?
$BASH_REMATCH contains the groups from the latest regex comparison done by [[.
$ [[ ' 123 456 ' =~ [[:blank:]]+([0-9]+)[[:blank:]]+([0-9]+)[[:blank:]]+ ]] && echo "*${BASH_REMATCH[1]}*${BASH_REMATCH[2]}*"
*123*456*