What is the Regular expression for multiple word combinations - regex

Can some one help me to find a regular expression for the pattern below,
I am trying to use regular expression to match a patter in shell script.
The pattern could be a combination of 'ab' 'xy' 'ij' 'pqr', in any order and seperated by a comma ',' or only 'all'
Ex.
1) "ab,xy,ij,pqr" - valid
2) "ij,pqr" - valid
3) "all" - valid
4) "ij,ab," - invalid because it ends with a comma
5) "all,xy" - invalid because 'all' cannot be combined with xy ij pqr or ab
6) ",xy" - invalid because it starts with comma
7) "xy" - valid
Thank you.
#konsolebox #491243 #ajp15243 #Jerry
Looks like I am doing something wrong, it works only for the RE regex4="(ab|xy|ij|pqr)(,(ab|xy|ij|pqr))*" so far and that too only if the string is like "ab,xy" just "ab" doesn't work.
Here is what I have attempted so far:
#!/usr/bin/ksh
echo
echo echo $1
echo
regex2="^(all|(ab|xy|ij|pqr)(,(ab|xy|ij|pqr))*)$"
regex3="^(all|(ab|xy|ij|pqr)(,(ab|xy|ij|pqr))*)$"
regex4="(ab|xy|ij|pqr)(,(ab|xy|ij|pqr))*"
if [[ $1 == $regex2 ]]
then
echo "You got it again 22222222 !"
fi
if [[ $1 == $regex3 ]]
then
echo "You got it again 33333333 !"
fi
if [[ $1 == $regex4 ]]
then
echo "You got it again 44444444 !"
fi
Output:
$
$ test.ksh ab,xy
echo ab,xy
You got it again 44444444 !
$
$
$
$ test.ksh ab
echo ab
$
1:30 PST
Ok, had some improvement:
"((ab|xy|ij|pqr)|(ab|xy|ij|pqr)(,(ab|xy|ij|pqr))*)|all)$"
this one works when the input is "xy", "xy,ab" but it also treating "xy,ab,all" as valid input.

I think this will do it.
^(((ab|xy|ij|pqr)(,(ab|xy|ij|pqr))*)|all)$
Screenshot

Probably this:
^(all|(ab|xy|ij|pqr)(,(ab|xy|ij|pqr))*)$

ok, looks like finially the below RE worked, thank you all for the help.
(all)|(ab|xy|ij|pqr)|(ab|xy|ij|pqr)(,(ab|xy|ij|pqr)*)

Related

Regular expression on bash/shell/python for githook pre-commit

I am trying to work with regular expression
I have a string in format
[+/-] Added Feature 305105:WWE-108. Added Dolph Ziggler super star
Let's look on each part of string
1) [+/-] – bracket quotes are important. it can [+] or [-]. or [+/-]. not "+", or "-", or "+/-" without bracket quotes
2) Added – it can be "Added", "Resolved", "Closed"
3) 305105 – any numbers
4) Feature – it can be "Feaute", "Bug", "Fix"
5) : – very imporant delimiter
6) WWE-108 – any text with delimiter "–" and with numbers after delimiter
7) . – very imporant delimiter
8) Added Dolph Ziggler super star – any text
What I tried to do
Let's try to resolve each part:
1) echo '[+]' | egrep -o "[+/-]+". Yes, it works, but, it works also for [+/], or [/]. and I see result without bracket quotes
2) echo "Resolved" | egrep -o "Added$|Resolved$|Closed$". It works
3) echo '124214215215' | egrep -o "[0-9]+$". It works
4) echo "Feature" | egrep -o "Feature$|Bug$|Fix$". It works too
5) I have not found how
6) echo "WWE-108" | egrep -o "[a-zA-Z]+-[0-9]+". It works too
7) I have not found how
8) Any text
The main question. How to concatenate, all these points via bash with spaces, according to this template. [+/-] Added Feature 305105:WWE-108. Added Dolph Ziggler super star. I am not familiar with regexp, as for me, I'd like to do something like this:
string="[+/-] Added Feature 305105:WWE-108. Added Dolph Ziggler super star"
first=$(echo $string | awk '{print $1}')
if [[ $first == "[+]" ]]; then
echo "ok"
echo $first
elif [[ $first == "[*]" ]]; then
echo "ok2"
echo $first
elif [[ $first == "[+/-]" ]]; then
echo "ok3"
echo "$first"
else
echo "not ok"
echo $first
exit 1
fi
But it is not ok. Can you please help me in a little bit with creation of regexp on bash. Also, python it is ok too for me.
Why I am doing this ? I want to make pre-commit hook, in format like this.
[+/-] Added Feature 305105:WWE-108. Added Dolph Ziggler super star. This is a reson, why I am doing this.
Answer from comment. Putting all together.
egrep '^\[(\+|-|\+/-)\] (Added|Resolved|Closed) (Feature|Bug|Fix) [0-9]+:[a-zA-Z]+-[0-9]+\..+'
a general rule, with extended regex, meta characters .*+^$(|)[]{}\ must be escaped with a backslash to have literal meaning (except in character sets between [] where rules are different).
Note, for culture, that with basic regex, it's the contrary, backslash was used to enable the specaial meaning of regex extensions (|){}+.
grep '^\[\(+\|-\|+/-\)\] \(Added\|Resolved\|Closed\) \(Feature\|Bug\|Fix\) [0-9]\+:[a-zA-Z]\+-[0-9]\+\..\+'
But it's longer and harder to understand.

While-Loop BASH-Regex not matching

Why is this INCREDIBALLY simple REGEX not matching?!!?
#!/bin/bash
while true
do
read -r -p $'What is the JIRA Ticket associated with this work?' JIRA
#Use a regular expresion to verify that our reply stored in JIRA is only 4 digits, if not, loop and try again.
if [[ ! "$JIRA" =~ [0-9]{4} ]]
then
echo -en "The JIRA Ticket should only be 4 digits\nPlease try again."
continue
else
break 1
fi
done
When prompted, if you type "ffffff" it catches, but if you type more than 4 digits "444444" or even toss a letter in there "4444444fffff" it catches nothing, hits the else block and quits. I think this is basic and I'm dumbfounded as to why its not catching the extra digits or characters?
I appreciate the help.
You need to change your equality test to:
if [[ ! "$JIRA" =~ ^[0-9]{4}$ ]]
This ensures that the entire string contains just four digits. ^ means beginning of string, $ means end of string.
The regular expression is open-ended, meaning it only has to match a substring of the left-hand argument, not the entire thing. Anchor your regular expression to force it to match the entire string:
if [[ ! "$JIRA" =~ ^[0-9]{4}$ ]]
Maybe a simpler pattern (== instead of =~) may solve your issue:
#!/bin/bash
while true
do
read -r -p $'What is the JIRA Ticket associated with this work?' JIRA
[[ $JIRA == [0-9][0-9][0-9][0-9] ]] && break 1
echo -en "The JIRA Ticket should only be 4 digits\nPlease try again."
done

Regular expression Bash issue

I have to match a string composed of only lowercase characters repeated 2 times , for example ballball or printprint. For example the word ball is not accepted because is not repeated 2 time.
For this reason I have this code:
read input
expr='^(([a-z]*){2})$'
if [[ $input =~ $expr ]]; then
echo "OK string"
exit 0
fi
exit 10
but it doesn't work , for example if I insert ball the script prints "OK string".
What do I wrong?
Not all Bash versions support backreferences in regexes natively. If yours doesn't, you can use an external tool such as grep:
read input
re='^\([a-z]\+\)\1$'
if grep -q "$re" <<< "$input"; then
echo "OK string"
exit 0
fi
exit 1
grep -q is silent and has a successful exit status if there was a match. Notice how (, + and ) have to be escaped for grep. (grep -E would understand () without escaping.)
Also, I've replaced your * with + so we don't match the empty string.
Alternatively: your requirement means that a matching string has two identical halves, so we can check for just that, without any regexes:
read input
half=$(( ${#input} / 2 ))
if (( half > 0 )) && [[ ${input:0:$half} = ${input:$half} ]]; then
echo "OK string"
fi
This uses Substring Expansion; the first check is to make sure that the empty string doesn't match.
Your requirement is to match strings made of two repeated words. This is easy to do by just checking if the first half of your string is equal to the remaining part. No need to use regexps...
$ var="byebye" && len=$((${#var}/2))
$ test ${var:0:$len} = ${var:$len} && { echo ok ; } || echo no
ok
$ var="abcdef" && len=$((${#var}/2))
$ test ${var:0:$len} = ${var:$len} && { echo ok ; } || echo no
no
The regex [a-z]* will match any alphanumeric or empty string.
([a-z]*){2} will match any two of those.
Ergo, ^(([a-z]*){2})$ will match any string containing zero or more alphanumeric characters.
Using the suggestion from #hwnd (replacing {2} with \1) will enforce a match on two identical strings.
N.B: You will need a fairly recent version of bash. Tested in bash 4.3.11.

Shell: Checking if argument exists and matches expression

I'm new to shell scripting and trying to write the ability to check if an argument exists and if it matches an expression. I'm not sure how to write expressions, so this is what I have so far:
#!/bin/bash
if [[ -n "$1"] && [${1#*.} -eq "tar.gz"]]; then
echo "Passed";
else
echo "Missing valid argument"
fi
To run the script, I would type this command:
# script.sh YYYY-MM.tar.gz
I believe what I have is
if the YYYY-MM.tar.gz is not after script.sh it will echo "Missing valid argument" and
if the file does not end in .tar.gz it echo's the same error.
However, I want to also check if the full file name is in YYYY-MM.tar.gz format.
if [[ -n "$1" ]] && [[ "${1#*.}" == "tar.gz" ]]; then
-eq: (equal) for arithmetic tests
==: to compare strings
See: help test
You can also use:
case "$1" in
*.tar.gz) ;; #passed
*) echo "wrong/missing argument $1"; exit 1;;
esac
echo "ok arg: $1"
As long as the file is in the correct YYYY-MM.tar.gz format, it obviously is non-empty and ends in .tar.gz as well. Check with a regular expression:
if ! [[ $1 =~ [0-9]{4}-[0-9]{1,2}.tar.gz ]]; then
echo "Argument 1 not in correct YYYY-MM.tar.gz format"
exit 1
fi
Obviously, the regular expression above is too general, allowing names like 0193-67.tar.gz. You can adjust it to be as specific as you need it to be for your application, though. I might recommend
[1-9][0-9]{3}-([1-9]|10|11|12).tar.gz
to allow only 4-digit years starting with 1000 (support for the first millennium ACE seems unnecessary) and only months 1-12 (no leading zero).

sed regex to match ['', 'WR' or 'RN'] + 2-4 digits

I'm trying to do some conditional text processing on Unix and struggling with the syntax. I want to acheive
Find the first 2, 3 or 4 digits in the string
if 2 characters before the found digits are 'WR' (could also be lower case)
Variable = the string we've found (e.g. WR1234)
Type = "work request"
else
if 2 characters before the found digits are 'RN' (could also be lower case)
Variable = the string we've found (e.g. RN1234)
Type = "release note"
else
Variable = "WR" + the string we've found (Prepend 'WR' to the digits)
Type = "Work request"
fi
fi
I'm doing this in a Bash shell on Red Hat Enterprise Linux Server release 5.5 (Tikanga)
Thanks in advance,
Karl
I'm not sure how you read in your strings but this example should help you get there. I loop over 4 example strings, WR1234 RN456 7890 PQ2342. You didn't say what to do if the string doesn't match your expected format (PQ2342 in my example), so my code just ignores it.
#!/bin/bash
for string in "WR1234 - Work Request Name.doc" "RN5678 - Release Note.doc"; do
[[ $string =~ ^([^0-9]*)([0-9]*).*$ ]]
case ${BASH_REMATCH[1]} in
"WR")
var="${BASH_REMATCH[1]}${BASH_REMATCH[2]}"
type="work request"
echo -e "$var\t-- $type"
;;
"RN")
var="${BASH_REMATCH[1]}${BASH_REMATCH[2]}"
type="release note"
echo -e "$var\t-- $type"
;;
"")
var="WR${BASH_REMATCH[2]}"
type="work request"
echo -e "$var\t-- $type"
;;
esac
done
Output
$ ./rematch.sh
WR1234 -- work request
RN5678 -- release note
I like to use perl -pe instead of sed because PERL has such expressive regular expressions. The following is a bit verbose for the sake of instruction.
example.txt:
WR1234 - Work Request name.doc
RN456
rn456
WR7890 - Something else.doc
wr789
2456
script.sh:
#! /bin/bash
# search for 'WR' or 'RN' followed by 2-4 digits and anything else, but capture
# just the part we care about
records="`perl -pe 's/^((WR|RN)([\d]{2,4})).*/\1/i' example.txt`"
# now that you've filtered out the records, you can do something like replace
# WR's with 'work request'
work_requests="`echo \"$records\" | perl -pe 's/wr/work request /ig' | perl -pe 's/rn/release note /ig'`"
# or add 'WR' to lines w/o a listing
work_requests="`echo \"$work_requests\" | perl -pe 's/^(\d)/work request \1/'`"
# or make all of them uppercase
records_upper=`echo $records | tr '[:lower:]' '[:upper:]'`
# or count WR's
wr_count=`echo "$records" | grep -i wr | wc -l`
echo count $wr_count
echo "$work_requests"
#!/bin/bash
string="RN12344 - Work Request Name.doc"
echo "$string" | gawk --re-interval '
{
if(match ($0,/(..)[0-9]{4}\>/,a ) ){
if (a[1]=="WR"){
type="Work release"
}else if ( a[1] == "RN" ){
type = "Release Notes"
}
print type
}
}'