Perl Regex not working in Bash script? - regex

I have a regular expression:
^(.+?)(\.[^.]+$|$)
which separates a file name and the file extension (if there is one)
http://movingtofreedom.org/2008/04/01/regex-match-filename-base-and-extension/
Works perfectly fine in Perl
Say $FILE ='.myfile.form.txt'
$1 is '.myfile.form' and
$2 is '.txt', as they should be
I know Bash regex and Perl regex aren't the same, but I've never had a problem with Bash Rematching until now
But when I try to use in in a Bash script as, say...
FILE='.myfile.form.txt'
[[ $FILE =~ ^(.+?)(\.[^.]+$|$) ]]
${BASH_REMATCH[1]} will just have the entire file name (.myfile.form.txt), and nothing in ${BASH_REMATCH[2]}
I'm wondering what's wrong/going on here
Thanks for any help!

regex(7) which is referenced by regex(3) which is referenced by bash(1) makes no mention of greediness modifiers. Your pattern cannot be implemented in bash regex.
This doesn't mean you can't achieve what you want, though.
[[ $FILE =~ ^(.+)(\.[^.]*)$ ]] || [[ $FILE =~ ^(.*)()$ ]]
file="${BASH_REMATCH[1]}"
ext="${BASH_REMATCH[2]}"
Or something more straightforward like
if [[ $FILE =~ ^(.+)(\.[^.]*)$ ]]; then
file="${BASH_REMATCH[1]}"
ext="${BASH_REMATCH[2]}"
else
file="$FILE"
ext=""
fi

Related

Multiline Bash Regex Match

I've been reading through tons of threads about this, but none have helped me yet.
This is a sample of my text:
[userId:#"1"
userName:#""
userPos:#[#"11006321C", ]
userDisp:#[#"4200012FD6", ]];
[userId:#"2"
userName:#""
userPos:#[#"412520084C",
#"7200851",
#"54720021",
]
userDisp:#[#"230035FD6",
#"3213456432C0035FD6",
#"1F200538D5",
]];
I'm trying to capture this:
userPos:#[#"11006321C", ]
userDisp:#[#"4200012FD6", ]]
and
userPos:#[#"412520084C",
#"7200851",
#"54720021",
]
userDisp:#[#"230035FD6",
#"3213456432C0035FD6",
#"1F200538D5",
]]
(All matches from the text) using regex: userPos:#\[((?:.\r?\n?)*)\]
Trying it in bash using:
for string in $file # text has been read into this variable
do
[[ $word =~ $regex ]]
if [[ ${BASH_REMATCH[0]} ]]
then
string="${x:+x }${BASH_REMATCH[0]}"
userlist+=("$string")
echo "$string"
fi
done
To append them to a list.
But this doesn't work since the regex matches noting at all. I know there are different kinds of Regex engines and stuff, and I've tried so many different regexes for this to work in bash, but can't seem to get it to work.
Anyone who could help me capture what I want in bash?
The regex you're looking for is userPos:([^;]*)].
regex="userPos:([^;]*)]"
while [[ $text =~ $regex ]]
do
string="${x:+x }${BASH_REMATCH[0]}"
userlist+=("$string")
echo "$string"
text=${text#*"${BASH_REMATCH[0]}"}
done
$text is your text.

regex for finding file extension

I am using below regex in my script to read files ending of type _L001_R1_001.fastq or _L001_R2_001.fastq
if it is R1 it should be read into readPair_1 and if R2 it should be read into readPair_2 but its not matching anything.
can anyone please tell me what is wrong here?
My script:
#! /bin/bash -l
Proj_Dir="${se_ProjDir}/*.fastq"
for Dir in $Proj_Dir
do
if [[ "$Dir" =~ _L.*_R1_001.fastq]]
then
readPair_1=$Dir
echo $readPair_1
fi
if [[ "$Dir" =~ _L.*_R2_001.fastq]]
then
readPair_2=$Dir
echo $readPair_2
fi
Files:
Next-ID-1-MN-SM5144-170509-ABC_S1_L001_R1_001.fastq
Next-ID-1-MN-SM5144-170509-ABC_S1_L001_R2_001.fastq
Next-ID-1-MN-SM5144-170509-ABC_S2_L001_R1_001.fastq
Next-ID-1-MN-SM5144-170509-ABC_S2_L001_R2_001.fastq
Next-ID-1-MN-SM5144-170509-ABC_S3_L001_R1_001.fastq
Next-ID-1-MN-SM5144-170509-ABC_S3_L001_R2_001.fastq
You need .gz at the end of your pattern. You're not getting any files at all:
Proj_Dir="${se_ProjDir}/*.fastq.gz"
You also need spaces before ]]:
if [[ "$Dir" =~ _L.*_R1_001.fastq ]]
and
if [[ "$Dir" =~ _L.*_R1_002.fastq ]]
Try:
L001_R[12]_001\.fastq\.gz$
This will look for either the R1 or R2 files, and ensure that that's how the filename string ends.
The regular expression for =~ operator must match the whole string. Therefore you should modify your regular expression in if statements as follows: .*_L.*_R1_001.fastq and .*_L.*_R2_001.fastq.

How can I run a regex against a filename?

In a list of files:
javascript-custom-rules-plugin-1.0-SNAPSHOT.jar
README.txt
sonar-build-breaker-plugin-2.0.jar
sonar-javascript-plugin-2.11.jar
tmo-custom-rules-1.0.jar
I am attempting to match these filenames by regex.
My Script
#!/usr/bin/env bash
install_location=/usr/local/sonar/extensions/plugins
for f in $(ls -1 $install_location)
do
# remove any previous versions of this plugin
if [[ "$f" =~ ".*tmo-custom-rules-(.+)\.jar" ]]
then
echo "found $f. will remove"
else
echo "$f doesn't match"
fi
done
I've tried if [[ "$f" =~ ".*tmo-custom-rules-(.+)\.jar" ]] and if [[ "$f" == *"tmo-custom-rules" ]] to no avail.
I'm getting
javascript-custom-rules-plugin-1.0-SNAPSHOT.jar doesn't match
README.txt doesn't match
sonar-build-breaker-plugin-2.0.jar doesn't match
sonar-javascript-plugin-2.11.jar doesn't match
tmo-custom-rules-1.0.jar doesn't match
when I expect found tmo-custom-rules-1.0.jar. will remove
I've run my regular expression through many regular expression testers with the data above, and they all return the correct matches, but I can't get it to work here in my script.
How can I loop through, and check to see if any files matches this regular expression?
In BASH regex must be unquoted so this should work:
[[ $f =~ .*tmo-custom-rules-(.+)\.jar ]]
Or better:
re=".*tmo-custom-rules-(.+)\.jar"
[[ $f =~ $re ]]
However you don't even need regex and can use shell glob matching:
#!/usr/bin/env bash
install_location=/usr/local/sonar/extensions/plugins
for f in "$install_location"/*
do
# remove any previous versions of this plugin
if [[ $f == *tmo-custom-rules-*.jar ]]
then
echo "found $f. will remove"
else
echo "$f doesn't match"
fi
done
Note that you can avoid using output of ls which is not always fit for scripting.
You can do this with expr using the colon operator:
if expr "$f" : '.*tmo-custom-rules-.*\.jar' > /dev/null; then
echo matches
fi
Note that the regular expressions in this context are assumed to be anchored to the start of the line.

bash if [[ =~ regex compare not working?

I have a value in a variable that may be absolute or relative url, and I need to check which one it is.
I have found that there's a =~ operator in [[, but I can't get it to work. What am I doing wrong?
url="http://test"
if [[ "$url" =~ "^http://" ]];
then echo "absolute.";
fi;
You need to use regex without quote:
url="http://test"
if [[ "$url" =~ ^http:// ]]; then
echo "absolute."
fi
This outputs `absolute. as regex needs to be without quote in newer BASH (after BASH v3.1)
Or avoid regex and use glob matching:
if [[ "$url" == "http://"* ]]; then
echo "absolute."
fi

Regular expression for extracting date

I a regular expression to match a date on the form 01/Jan/2000:23:59:59. I managed to match the pattern using Notepad++'s regex interpreter, using the following regex:
[1-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]
Unfortunately, I need to do this with bash. AWK is not an option right now, I'm afraid. So, I tried to convert the above regex into something that bash would interpret in the same way. Thus far, I've come up with this:
[1-3][0-9]/[A-Z][a-z]\{2\}/(19|20)[0-9]\{2\}:[0-2][0-9]:[0-5][0-9]:[0-5][0-9]
The full command I'm using is
expr "$line" : '\([1-3][0-9]/[A-Z][a-z]\{2\}/(19|20)[0-9]\{2\}:[0-2][0-9]:[0-5][0-9]:[0-5][0-9]\)'
where $line contains the string out of which I need to extract the date. Unfortunately my bash version of the regex doesn't work. I have tried different things, like escaping / and :, but I can't seem to get it to work. What am I doing wrong?
The only problem was your first pattern [1-3]. It should be [0-3].
[[ $DATE =~ [0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9] ]]
Also, on some earlier versions of Bash you have to store it on a variable:
RE='[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]'
[[ $DATE =~ $RE ]]
Example:
> DATE='01/Jan/2000:23:59:59'
> [[ $DATE =~ [0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9] ]] && echo Match.
Match.
Bash 3.0:
> echo "$BASH_VERSION"
3.00.0(1)-release
> DATE='01/Jan/2000:23:59:59'
> RE='[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]'
> [[ $DATE =~ $RE ]] && echo Match.
Match.
If you want to apply it on a loop, you can have something like this:
RE='[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]'
while read -r LINE; do
[[ $LINE =~ $RE ]] && echo "Match: $LINE"
done < date_list.txt
By the way, if you want to exactly match the whole word only use add ^ and $ at the beginning and the end of pattern:
[[ $DATE =~ ^[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]$ ]]
To extract matches on the line use () and BASH_REMATCH:
[[ $DATE =~ .*([0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]).* ]] && echo "${BASH_REMATCH[1]}"