How can I use the following regex in a BASH script?
(?=^.{8,255}$)((?=.*\d)(?!.*\s)(?=.*[A-Z])(?=.*[a-z]))^.*
I need to check the user input(password) for the following:
at least one Capital Letter.
at least one number.
at least one small letter.
and the password should be between 8 and 255 characters long.
If your version of grep has the -P option it supports PCRE (Perl-Compatible Regular Expressions.
grep -P '(?=^.{8,255}$)(?=^[^\s]*$)(?=.*\d)(?=.*[A-Z])(?=.*[a-z])'
I had to change your expression to reject spaces since it always failed. The extra set of parentheses didn't seem necessary. I left off the ^.* at the end since that always matches and you're really only needing the boolean result like this:
while ! echo "$password" | grep -P ...
do
read -r -s -p "Please enter a password: " password
done
I'm don't think that your regular expression is the best (or correct?) way to check the things on your list (hint: I'd check the length independently of the other conditions), but to answer the question about using it in Bash: use the return value of grep -Eq, e.g.:
if echo "$candidate_password" | grep -Eq "$strong_pw_regex"; then
echo strong
else
echo weak
fi
Alternatively in Bash 3 and later you can use the =~ operator:
if [[ "$candidate_password" =~ "$strong_pw_regex" ]]; then
…
fi
The regexp syntax of grep -E or Bash does not necessarily support all the things you are using in your example, but it is possible to check your requirements with either. But if you want fancier regular expressions, you'll probably need to substitute something like Ruby or Perl for Bash.
As for modifying your regular expression, check the length with Bash (${#candidate_password} gives you the length of the string in the variable candidate_password) and then use a simple syntax with no lookahead. You could even check all three conditions with separate regular expressions for simplicity.
These matches are connected with the logical AND operator, which means the only good match is when all of them match.
Therefore the simplest way is to match those conditions chained, with the previous result piped into the next expression. Then if any of the matches fail, the entire expression fails:
$echo "tEsTstr1ng" | egrep "^.{8,255}"| egrep "[ABCDEFGHIJKLMNOPQRSTUVWXYZ]"| egrep "[abcdefghijklmnopqrstuvwxyz"] | egrep "[0-9]"
I manually entered all characters instead of "[A-Z]" and "[a-z]" because different system locales might substitute them as [aAbBcC..., which is two conditions in one match and we need to check for both conditions.
As shell script:
#!/bin/sh
a="tEsTstr1ng"
b=`echo $a | egrep "^.{8,255}" | \
egrep "[ABCDEFGHIJKLMNOPQRSTUVWXYZ]" | \
egrep "[abcdefghijklmnopqrstuvwxyz"] | \
egrep "[0-9]"`
# now featuring W in the alphabet string
#if the result string is empty, one of the conditions has failed
if [ -z $b ]
then
echo "Conditions do not match"
else
echo "Conditions match"
fi
grep with -E option uses the Extended regular expression(ERE)From this documentation ERE does not support look ahead.
So you can use Perl for this as:
perl -ne 'exit 1 if(/(?=^.{8,255}$)((?=.*\\d)(?=.*[A-Z])(?=.*[a-z])|(?=.*\\d)(?=.*[^A-Za-z0-9])(?=.*[a-z])|(?=.*[^A-Za-z0-9])(?=.*[A-Z])(?=.*[a-z])|(?=.*\\d)(?=.*[A-Z])(?=.*[^A-Za-z0-9]))^.*/);exit 0;'
Ideone Link
I get that you are looking for regex, but have you consider doing it through PAM module?
dictionary
quality
There might be other interesting modules.
Related
I have the following string /path/to/my-jar-1.0.jar for which I am trying to write a bash regex to pull out my-jar.
Now I believe the following regex would work: ([^\/]*?)-\d but I don't know how to get bash to run it.
The following: echo '/path/to/my-jar-1.0.jar' | grep -Po '([^\/]*?)-\d' captures my-jar-1
In BASH you can do:
s='/path/to/my-jar-1.0.jar'
[[ $s =~ .*/([^/[:digit:]]+)-[[:digit:]] ]] && echo "${BASH_REMATCH[1]}"
my-jar
Here "${BASH_REMATCH[1]}" will print captured group #1 which is expression inside first (...).
You can do this as well with shell prefix and suffix removal:
$ path=/path/to/my-jar-1.0.jar
# Remove the longest prefix ending with a slash
$ base="${path##*/}"
# Remove the longest suffix starting with a dash followed by a digit
$ base="${base%%-[0-9]*}"
$ echo "$base"
my-jar
Although it's a little annoying to have to do the transform in two steps, it has the advantage of only using Posix features so it will work with any compliant shell.
Note: The order is important, because the basename cannot contain a slash, but a path component could contain a dash. So you need to remove the path components first.
grep -o doesn't recognize "capture groups" I think, just the entire match. That said, with Perl regexps (-P) you have the "lookahead" option to exclude the -\d from the match:
echo '/path/to/my-jar-1.0.jar' | grep -Po '[^/]*(?=-\d)'
Some reference material on lookahead/lookbehind:
http://www.perlmonks.org/?node_id=518444
I want to match an input string (contained in the variable $1) with a regex representing the date formats MM/DD/YYYY and MM-DD-YYYY.
REGEX_DATE="^\d{2}[\/\-]\d{2}[\/\-]\d{4}$"
echo "$1" | grep -q $REGEX_DATE
echo $?
The echo $? returns the error code 1 no matter the input string.
To complement the existing helpful answers:
Using Bash's own regex-matching operator, =~, is a faster alternative in this case, given that you're only matching a single value already stored in a variable:
set -- '12-34-5678' # set $1 to sample value
kREGEX_DATE='^[0-9]{2}[-/][0-9]{2}[-/][0-9]{4}$' # note use of [0-9] to avoid \d
[[ $1 =~ $kREGEX_DATE ]]
echo $? # 0 with the sample value, i.e., a successful match
Note, however, that the caveat re using flavor-specific regex constructs such as \d equally applies:
While =~ supports EREs (extended regular expressions), it also supports the host platform's specific extension - it's a rare case of Bash's behavior being platform-dependent.
To remain portable (in the context of Bash), stick to the POSIX ERE specification.
Note that =~ even allows you to define capture groups (parenthesized subexpressions) whose matches you can later access through Bash's special ${BASH_REMATCH[#]} array variable.
Further notes:
$kREGEX_DATE is used unquoted, which is necessary for the regex to be recognized as such (quoted parts would be treated as literals).
While not always necessary, it is advisable to store the regex in a variable first, because Bash has trouble with regex literals containing \.
E.g., on Linux, where \< is supported to match word boundaries, [[ 3 =~ \<3 ]] && echo yes doesn't work, but re='\<3'; [[ 3 =~ $re ]] && echo yes does.
I've changed variable name REGEX_DATE to kREGEX_DATE (k signaling a (conceptual) constant), so as to ensure that the name isn't an all-uppercase name, because all-uppercase variable names should be avoided to prevent conflicts with special environment and shell variables.
I think this is what you want:
REGEX_DATE='^\d{2}[/-]\d{2}[/-]\d{4}$'
echo "$1" | grep -P -q $REGEX_DATE
echo $?
I've used the -P switch to get perl regex.
the problem is you're trying to use regex features not supported by grep. namely, your \d won't work. use this instead:
REGEX_DATE="^[[:digit:]]{2}[-/][[:digit:]]{2}[-/][[:digit:]]{4}$"
echo "$1" | grep -qE "${REGEX_DATE}"
echo $?
you need the -E flag to get ERE in order to use {#} style.
I have a list of lines:
<some_random_text="someval" my_val_="0.4" some_random_text_1="someval_">
<some_random_text="someval" my_val_="0.8" some_random_text_1="someval_">
<some_random_text="someval" my_val_="1.2" some_random_text_1="someval_">
and so on.
From each line, I want to return the numeric value given after my_val_. How can I do this in bash?
Within this very rigid structure, what you want to do is quite easy using sed:
sed 's/.*my_val_="\([0-9.]\{1,\}\)".*/\1/' file
or using extended regular expressions:
sed -r 's/.*my_val_="([0-9.]+)".*/\1/' file
This captures the part you're interested in (the digits and dots between the quotes) and uses them to replace the contents of the line.
As mentioned in the comments (thanks), the switch to enable extended regular expressions differs between versions of sed. Out of habit, I tend to use -r but some implementations (such as BSD sed on OSX) work with -E instead. Others work with either -r or -E but neither option is defined by the standard.
This could also be done in native bash (although I wouldn't recommend it...):
re='my_val_="([0-9.]+)"'
while read -r line; do
[[ $line =~ $re ]] && echo "${BASH_REMATCH[1]}"
done < file
=~ is the regex match operator. The captured digits and dots are stored in element 1 of the special array BASH_REMATCH.
The sed and bash approaches are subtly different, as the sed version will print all lines in the file, even if they don't match the pattern. If this is a problem, you can add the -n switch and a p at the end of the command to print matching lines:
sed -nr 's/.*my_val_="([0-9.]+)".*/\1/p' file
With grep:
grep -oP 'my_val_="\K[^"]*' filename
-o so that grep only prints only the match, -P so that Perl-compatible regexes are used.
The \K in the regex removes from the match everything that was matched by the part of the regex that came before it; this has the effect of a lookbehind: only non-quote characters that come directly after my_val_=" are matched.
I need to extract the part of a string in a shell script. The original string is pretty complicated, so I really need a regular expression to select the right part of the original string - justing removing a prefix and suffix won't work. Also, the regular expression needs to check the context of the string I want to extract, so I e.g. need a regular expression a\([^b]*\)b to extract 123 from 12a123b23.
The shell script needs to be portable, so I cannot make use of the Bash constructs [[ and BASH_REMATCH.
I want the script to be robust, so when the regular expression does not match, the script should notice this e.g. through a non-zero exit code of the command to be used.
What is a good way to do this?
I've tried various tools, but none of them fully solved the problem:
expr match "$original" ".*$regex.*" works except for the error case. With this command, I don't know how to detect if the regex did not match. Also, expr seems to take the extracted string to determine its exit code - so when I happened to extract 00, expr had an exit code of 1. So I would need to generally ignore the exit code with expr match "$original" ".*$regex.*" || true
echo "$original" | sed "s/.*$regex.*/\\1/" also works except for the error case. To handle this case, I'd need to test if I got back the original string, which is also quite unelegant.
So, isn't there a better way to do this?
You could use the -n option of sed to suppress output of all input lines and add the p option to the substitute command, like this:
echo "$original" | sed -n -e "s/.*$regex.*/\1/p"
If the regular expression matches, the matched group is printed as before. But now if the regular expression does not match, nothing is printed and you will need to test only for the empty string.
How about grep -o the only possible problem is portability, otherwise it satisfies all requirements:
➜ echo "hello and other things" | grep -o hello
hello
➜ echo $?
0
➜ echo "hello and other things" | grep -o nothello
➜ echo $?
1
One of the best things is that since it's grep you can pick what regex's you want whether BRE, ERE or Perl.
if egrep is available (pretty much all time)
egrep 'YourPattern' YourFile
or
egrep "${YourPattern}" YourFile
if only grep is available
grep -e 'YourPattern' YourFile
you check with a classical [ $? -eq 0 ] for the status of the command (also take into account bad YourFile access)
for the content itself, extract with sed or awk (for portability issue) (after the failure test)
Content="$( sed -n -e "s/.*\(${YourPattern}\).*/\1/p;q" )"
I am iterating through file names with bash and am in need of a way to pull out a specific number from the string notated by a preceding character. Essentially, all of the files have a part of their name that looks like D01 or D02. An example filename is Build-asdasdasd.D01V02.dat. I am trying to use sed, but to no avail thus far. Thanks!
Pure Bash:
name='Build-asdasdasd.D01V02.dat'
[[ "$name" =~ \.(D[[:digit:]]{2}[[:upper:]][[:digit:]]{2})\. ]] \
&& number="${BASH_REMATCH[1]}" || number=''
echo "'$number'"
The echo shows
'D01V02'
You don't have to do everything in a single expression. You can build a pipeline, like so:
echo 'Build-asdasdasd.D01V02.dat' |
egrep -o '\.D([[:digit:]]{2}[^.]+)' |
sed 's/^.//'
This returns D01V02 for me, but you may want to test your expression against a wider corpus to see if there are any edge cases.
Here is another pure bash answer that assumes your filenames are always similar to the example. Regex is not required.
name='Build-asdasdasd.D01V02.dat'
number="${name%.*}"
number="${name##*.}"
echo "$number"
Your question is quite unclear. If you want the digits after the D, you can use
f="Build-asdasdasd.D01V02.dat"
num=$(grep -Po '(?<=D)\d\d' <<< "$f")
or
num=$(sed 's/^.*D\([[:digit:]][[:digit:]]\).*/\1/' <<< "$f")