regex for finding file extension - regex

I am using below regex in my script to read files ending of type _L001_R1_001.fastq or _L001_R2_001.fastq
if it is R1 it should be read into readPair_1 and if R2 it should be read into readPair_2 but its not matching anything.
can anyone please tell me what is wrong here?
My script:
#! /bin/bash -l
Proj_Dir="${se_ProjDir}/*.fastq"
for Dir in $Proj_Dir
do
if [[ "$Dir" =~ _L.*_R1_001.fastq]]
then
readPair_1=$Dir
echo $readPair_1
fi
if [[ "$Dir" =~ _L.*_R2_001.fastq]]
then
readPair_2=$Dir
echo $readPair_2
fi
Files:
Next-ID-1-MN-SM5144-170509-ABC_S1_L001_R1_001.fastq
Next-ID-1-MN-SM5144-170509-ABC_S1_L001_R2_001.fastq
Next-ID-1-MN-SM5144-170509-ABC_S2_L001_R1_001.fastq
Next-ID-1-MN-SM5144-170509-ABC_S2_L001_R2_001.fastq
Next-ID-1-MN-SM5144-170509-ABC_S3_L001_R1_001.fastq
Next-ID-1-MN-SM5144-170509-ABC_S3_L001_R2_001.fastq

You need .gz at the end of your pattern. You're not getting any files at all:
Proj_Dir="${se_ProjDir}/*.fastq.gz"
You also need spaces before ]]:
if [[ "$Dir" =~ _L.*_R1_001.fastq ]]
and
if [[ "$Dir" =~ _L.*_R1_002.fastq ]]

Try:
L001_R[12]_001\.fastq\.gz$
This will look for either the R1 or R2 files, and ensure that that's how the filename string ends.

The regular expression for =~ operator must match the whole string. Therefore you should modify your regular expression in if statements as follows: .*_L.*_R1_001.fastq and .*_L.*_R2_001.fastq.

Related

Regex not matching name in filepath

I have a folder with ipa files. I need to identify them by having a appstore or enterprise in the filename.
mles:drive-ios-swift mles$ ls build
com.project.drive-appstore.ipa
com.project.test.swift.dev-enterprise.ipa
com.project.drive_v2.6.0._20170728_1156.ipa
I've tried:
#!/bin/bash -veE
fileNameRegex="**appstore**"
for appFile in build-test/*{.ipa,.apk}; do
if [[ $appFile =~ $fileNameRegex ]]; then
echo "$appFile Matches"
else
echo "$appFile Does not match"
fi
done
However nothing matches:
mles:drive-ios-swift mles$ ./test.sh
build-test/com.project.drive-appstore.ipa Does not match
build-test/com.project.drive_v2.6.0._20170728_1156.ipa Does not match
build-test/com.project.test.swift.dev-enterprise.ipa Does not match
build-test/*.apk Does not match
How would the correct script look like to match build-test/com.project.drive-appstore.ipa?
You are confusing between the glob string match with a regex match. For a greedy glob match like * you can just use the test operator with ==,
#!/usr/bin/env bash
fileNameGlob='*appstore*'
# ^^^^^^^^^^^^ Single quote the regex string
for appFile in build-test/*{.ipa,.apk}; do
# To skip non-existent files
[[ -e $appFile ]] || continue
if [[ $appFile == *${fileNameGlob}* ]]; then
echo "$appFile Matches"
else
echo "$appFile Does not match"
fi
done
produces a result
build-test/com.project.drive_v2.6.0._20170728_1156.ipa Does not match
build-test/com.project.drive-appstore.ipa Matches
build-test/com.project.test.swift.dev-enterprise.ipa Does not match
(or) with a regex use greedy match .* as
fileNameRegex='.*appstore.*'
if [[ $appFile =~ ${fileNameRegex} ]]; then
# rest of the code
That said to match your original requirement to match enterprise or appstore string in file name use extended glob matches in bash
Using glob:
shopt -s nullglob
shopt -s extglob
fileExtGlob='*+(enterprise|appstore)*'
if [[ $appFile == ${fileExtGlob} ]]; then
# rest of the code
and with regex,
fileNameRegex2='enterprise|appstore'
if [[ $appFile =~ ${fileNameRegex2} ]]; then
# rest of the code
You can use the following regex to match appstore and enterprise in a filename:
for i in build-test/*; do if [[ $i =~ appstore|enterprise ]]; then echo $i; fi; done

How can I run a regex against a filename?

In a list of files:
javascript-custom-rules-plugin-1.0-SNAPSHOT.jar
README.txt
sonar-build-breaker-plugin-2.0.jar
sonar-javascript-plugin-2.11.jar
tmo-custom-rules-1.0.jar
I am attempting to match these filenames by regex.
My Script
#!/usr/bin/env bash
install_location=/usr/local/sonar/extensions/plugins
for f in $(ls -1 $install_location)
do
# remove any previous versions of this plugin
if [[ "$f" =~ ".*tmo-custom-rules-(.+)\.jar" ]]
then
echo "found $f. will remove"
else
echo "$f doesn't match"
fi
done
I've tried if [[ "$f" =~ ".*tmo-custom-rules-(.+)\.jar" ]] and if [[ "$f" == *"tmo-custom-rules" ]] to no avail.
I'm getting
javascript-custom-rules-plugin-1.0-SNAPSHOT.jar doesn't match
README.txt doesn't match
sonar-build-breaker-plugin-2.0.jar doesn't match
sonar-javascript-plugin-2.11.jar doesn't match
tmo-custom-rules-1.0.jar doesn't match
when I expect found tmo-custom-rules-1.0.jar. will remove
I've run my regular expression through many regular expression testers with the data above, and they all return the correct matches, but I can't get it to work here in my script.
How can I loop through, and check to see if any files matches this regular expression?
In BASH regex must be unquoted so this should work:
[[ $f =~ .*tmo-custom-rules-(.+)\.jar ]]
Or better:
re=".*tmo-custom-rules-(.+)\.jar"
[[ $f =~ $re ]]
However you don't even need regex and can use shell glob matching:
#!/usr/bin/env bash
install_location=/usr/local/sonar/extensions/plugins
for f in "$install_location"/*
do
# remove any previous versions of this plugin
if [[ $f == *tmo-custom-rules-*.jar ]]
then
echo "found $f. will remove"
else
echo "$f doesn't match"
fi
done
Note that you can avoid using output of ls which is not always fit for scripting.
You can do this with expr using the colon operator:
if expr "$f" : '.*tmo-custom-rules-.*\.jar' > /dev/null; then
echo matches
fi
Note that the regular expressions in this context are assumed to be anchored to the start of the line.

Native bash regexp [[ $f =~ "^[^\.]+$" ]] never matching

I'm currently trying to loop through all files in a certain directory using bash. If the file matches the following regular expression, it outputs the filename. If it doesn't, it outputs 'not' and then the filename. The regular expression is supposed to filter out any files that have a '.' in them.
for f in * ; do
if [[ $f =~ "^[^\.]+$" ]]; then
echo "$f"
else
echo "not $f"
fi
done
It correctly loops through all the files, but for a reason that has stumped me for quite a while, I cannot get it to only exclude files with a '.' in them. For example, in a directory with the following files:
bashrc
gitconfig
install.sh
README.md
vimrc
the output of the script is such:
not bashrc
not gitconfig
not install.sh
not README.md
not vimrc
I validated the regular expression here. Any thoughts?
Don't quote the right-hand side of your expression.
if [[ $f =~ ^[^.]+$ ]]; then
Quotes make the string a literal substring, rather than a regular expression.
For better portability across bash versions, put your regex in a variable (single-quoted, which will make the backslash literal):
re='^[.]+$'
if [[ $f =~ $re ]]; then
That said, you could do this with an extglob as well:
shopt -s extglob # enable extended globs
for f in +([!.]); do
printf 'Matched %q\n' "$f"
done
...or with a general-purpose pattern match:
for f in *; do
if [[ $f = *.* ]]; then
printf '%q contains a dot\n' "$f"
else
printf '%q does not contain a dot\n' "$f"
fi
done

bash if [[ =~ regex compare not working?

I have a value in a variable that may be absolute or relative url, and I need to check which one it is.
I have found that there's a =~ operator in [[, but I can't get it to work. What am I doing wrong?
url="http://test"
if [[ "$url" =~ "^http://" ]];
then echo "absolute.";
fi;
You need to use regex without quote:
url="http://test"
if [[ "$url" =~ ^http:// ]]; then
echo "absolute."
fi
This outputs `absolute. as regex needs to be without quote in newer BASH (after BASH v3.1)
Or avoid regex and use glob matching:
if [[ "$url" == "http://"* ]]; then
echo "absolute."
fi

Perl Regex not working in Bash script?

I have a regular expression:
^(.+?)(\.[^.]+$|$)
which separates a file name and the file extension (if there is one)
http://movingtofreedom.org/2008/04/01/regex-match-filename-base-and-extension/
Works perfectly fine in Perl
Say $FILE ='.myfile.form.txt'
$1 is '.myfile.form' and
$2 is '.txt', as they should be
I know Bash regex and Perl regex aren't the same, but I've never had a problem with Bash Rematching until now
But when I try to use in in a Bash script as, say...
FILE='.myfile.form.txt'
[[ $FILE =~ ^(.+?)(\.[^.]+$|$) ]]
${BASH_REMATCH[1]} will just have the entire file name (.myfile.form.txt), and nothing in ${BASH_REMATCH[2]}
I'm wondering what's wrong/going on here
Thanks for any help!
regex(7) which is referenced by regex(3) which is referenced by bash(1) makes no mention of greediness modifiers. Your pattern cannot be implemented in bash regex.
This doesn't mean you can't achieve what you want, though.
[[ $FILE =~ ^(.+)(\.[^.]*)$ ]] || [[ $FILE =~ ^(.*)()$ ]]
file="${BASH_REMATCH[1]}"
ext="${BASH_REMATCH[2]}"
Or something more straightforward like
if [[ $FILE =~ ^(.+)(\.[^.]*)$ ]]; then
file="${BASH_REMATCH[1]}"
ext="${BASH_REMATCH[2]}"
else
file="$FILE"
ext=""
fi