Bash script for processing directories (REGEX) - regex

I can't seem to get the regex in my bash script to work:
#!/bin/bash
cd /var/lib/gitolite/repositories
for D in *; do
if [ -d "${D}" ]; then
if [["${D}" = [^[0-9A-Za-z\-_].git$] ]]; then
echo "${D}"
fi
fi
done
Possible names of directories:
test.git
test-admin12.git
test_admin.git
test_admin.git.bkp (these are the folders I DON'T want)
I don't want to launch a secondary process like: sed or grep or ls

Instead of:
if [["${D}" = [^[0-9A-Za-z\-_].git$] ]]; then
echo "${D}"
fi
Use this condition with correct syntax and correct regex:
[[ "${D}" =~ ^[0-9A-Za-z_-]+\.git$ ]] && echo "${D}"

This worked for me (slight change to your if statement):
for D in *; do
if [ -d "${D}" ]; then
if [[ ^[0-9A-Za-z_-]*\.git$ ]]; then
echo "${D}"
fi
fi
done

You can write a single glob to match valid directories if you enable extended patterns.
shopt -s extglob
for D in +([[:alnum:]_-]).git/; do
echo "$D"
done
The notation +(...) matches one or more of the enclosed pattern. The brackets match one of a single alphanumeric character, an underscore, or a hyphen (which does not need to be escaped if it is the last character in the brackets). The trailing slash limits matches to directories.

Related

Extract integers from string with bash

From a variable how to extract integers that will be in format *\d+.\d+.\d+* (4.12.3123) using bash.
filename="xzxzxzxz4.12.3123fsfsfsfsfsfs"
I have tried:
filename="xzxzxzxz4.12.3123fsfsfsfsfsfs"
if [[ "$filename" =~ (.*)(\d+.\d+.\d+)(.*) ]]; then
echo ${BASH_REMATCH}
echo ${BASH_REMATCH[1]}
echo ${BASH_REMATCH[2]}
echo ${BASH_REMATCH[3]}
else
echo 'nej'
fi
which does not work.
The easiest way to work with regexes in Bash, in terms of consistency between Bash versions and escaping, is to put the regex into a single-quoted variable and then use it unquoted, as below:
re='[0-9]+\.[0-9]+\.[0-9]+'
[[ $filename =~ $re ]] && printf '%s\n' "${BASH_REMATCH[#]}"
The main issue with your approach were that you were using the "Perl-style" \d, so in fact you could make your code work with:
if [[ "$filename" =~ (.*)([0-9]+\.[0-9]+\.[0-9]+)(.*) ]]; then
echo "${BASH_REMATCH[2]}"
fi
But this unnecessarily creates 3 capture groups, when you don't even need one. Note that I also changed . (any character) to \. (a literal .).
one way to extract:
grep -oP '\d\.\d+\.\d+' <<<$xfilename
There is one more way
$ filename="xzxzxzxz4.12.3123fsfsfsfsfsfs"
$ awk '{ if (match($0, /[0-9].[0-9]+.[0-9]+/, m)) print m[0] }' <<< "$filename"
4.12.3123

Regex not matching name in filepath

I have a folder with ipa files. I need to identify them by having a appstore or enterprise in the filename.
mles:drive-ios-swift mles$ ls build
com.project.drive-appstore.ipa
com.project.test.swift.dev-enterprise.ipa
com.project.drive_v2.6.0._20170728_1156.ipa
I've tried:
#!/bin/bash -veE
fileNameRegex="**appstore**"
for appFile in build-test/*{.ipa,.apk}; do
if [[ $appFile =~ $fileNameRegex ]]; then
echo "$appFile Matches"
else
echo "$appFile Does not match"
fi
done
However nothing matches:
mles:drive-ios-swift mles$ ./test.sh
build-test/com.project.drive-appstore.ipa Does not match
build-test/com.project.drive_v2.6.0._20170728_1156.ipa Does not match
build-test/com.project.test.swift.dev-enterprise.ipa Does not match
build-test/*.apk Does not match
How would the correct script look like to match build-test/com.project.drive-appstore.ipa?
You are confusing between the glob string match with a regex match. For a greedy glob match like * you can just use the test operator with ==,
#!/usr/bin/env bash
fileNameGlob='*appstore*'
# ^^^^^^^^^^^^ Single quote the regex string
for appFile in build-test/*{.ipa,.apk}; do
# To skip non-existent files
[[ -e $appFile ]] || continue
if [[ $appFile == *${fileNameGlob}* ]]; then
echo "$appFile Matches"
else
echo "$appFile Does not match"
fi
done
produces a result
build-test/com.project.drive_v2.6.0._20170728_1156.ipa Does not match
build-test/com.project.drive-appstore.ipa Matches
build-test/com.project.test.swift.dev-enterprise.ipa Does not match
(or) with a regex use greedy match .* as
fileNameRegex='.*appstore.*'
if [[ $appFile =~ ${fileNameRegex} ]]; then
# rest of the code
That said to match your original requirement to match enterprise or appstore string in file name use extended glob matches in bash
Using glob:
shopt -s nullglob
shopt -s extglob
fileExtGlob='*+(enterprise|appstore)*'
if [[ $appFile == ${fileExtGlob} ]]; then
# rest of the code
and with regex,
fileNameRegex2='enterprise|appstore'
if [[ $appFile =~ ${fileNameRegex2} ]]; then
# rest of the code
You can use the following regex to match appstore and enterprise in a filename:
for i in build-test/*; do if [[ $i =~ appstore|enterprise ]]; then echo $i; fi; done

How can I run a regex against a filename?

In a list of files:
javascript-custom-rules-plugin-1.0-SNAPSHOT.jar
README.txt
sonar-build-breaker-plugin-2.0.jar
sonar-javascript-plugin-2.11.jar
tmo-custom-rules-1.0.jar
I am attempting to match these filenames by regex.
My Script
#!/usr/bin/env bash
install_location=/usr/local/sonar/extensions/plugins
for f in $(ls -1 $install_location)
do
# remove any previous versions of this plugin
if [[ "$f" =~ ".*tmo-custom-rules-(.+)\.jar" ]]
then
echo "found $f. will remove"
else
echo "$f doesn't match"
fi
done
I've tried if [[ "$f" =~ ".*tmo-custom-rules-(.+)\.jar" ]] and if [[ "$f" == *"tmo-custom-rules" ]] to no avail.
I'm getting
javascript-custom-rules-plugin-1.0-SNAPSHOT.jar doesn't match
README.txt doesn't match
sonar-build-breaker-plugin-2.0.jar doesn't match
sonar-javascript-plugin-2.11.jar doesn't match
tmo-custom-rules-1.0.jar doesn't match
when I expect found tmo-custom-rules-1.0.jar. will remove
I've run my regular expression through many regular expression testers with the data above, and they all return the correct matches, but I can't get it to work here in my script.
How can I loop through, and check to see if any files matches this regular expression?
In BASH regex must be unquoted so this should work:
[[ $f =~ .*tmo-custom-rules-(.+)\.jar ]]
Or better:
re=".*tmo-custom-rules-(.+)\.jar"
[[ $f =~ $re ]]
However you don't even need regex and can use shell glob matching:
#!/usr/bin/env bash
install_location=/usr/local/sonar/extensions/plugins
for f in "$install_location"/*
do
# remove any previous versions of this plugin
if [[ $f == *tmo-custom-rules-*.jar ]]
then
echo "found $f. will remove"
else
echo "$f doesn't match"
fi
done
Note that you can avoid using output of ls which is not always fit for scripting.
You can do this with expr using the colon operator:
if expr "$f" : '.*tmo-custom-rules-.*\.jar' > /dev/null; then
echo matches
fi
Note that the regular expressions in this context are assumed to be anchored to the start of the line.

Native bash regexp [[ $f =~ "^[^\.]+$" ]] never matching

I'm currently trying to loop through all files in a certain directory using bash. If the file matches the following regular expression, it outputs the filename. If it doesn't, it outputs 'not' and then the filename. The regular expression is supposed to filter out any files that have a '.' in them.
for f in * ; do
if [[ $f =~ "^[^\.]+$" ]]; then
echo "$f"
else
echo "not $f"
fi
done
It correctly loops through all the files, but for a reason that has stumped me for quite a while, I cannot get it to only exclude files with a '.' in them. For example, in a directory with the following files:
bashrc
gitconfig
install.sh
README.md
vimrc
the output of the script is such:
not bashrc
not gitconfig
not install.sh
not README.md
not vimrc
I validated the regular expression here. Any thoughts?
Don't quote the right-hand side of your expression.
if [[ $f =~ ^[^.]+$ ]]; then
Quotes make the string a literal substring, rather than a regular expression.
For better portability across bash versions, put your regex in a variable (single-quoted, which will make the backslash literal):
re='^[.]+$'
if [[ $f =~ $re ]]; then
That said, you could do this with an extglob as well:
shopt -s extglob # enable extended globs
for f in +([!.]); do
printf 'Matched %q\n' "$f"
done
...or with a general-purpose pattern match:
for f in *; do
if [[ $f = *.* ]]; then
printf '%q contains a dot\n' "$f"
else
printf '%q does not contain a dot\n' "$f"
fi
done

use regular expression in if-condition in bash

I wonder the general rule to use regular expression in if clause in bash?
Here is an example
$ gg=svm-grid-ch
$ if [[ $gg == *grid* ]] ; then echo $gg; fi
svm-grid-ch
$ if [[ $gg == ^....grid* ]] ; then echo $gg; fi
$ if [[ $gg == ....grid* ]] ; then echo $gg; fi
$ if [[ $gg == s...grid* ]] ; then echo $gg; fi
$
Why the last three fails to match?
Hope you could give as many general rules as possible, not just for this example.
When using a glob pattern, a question mark represents a single character and an asterisk represents a sequence of zero or more characters:
if [[ $gg == ????grid* ]] ; then echo $gg; fi
When using a regular expression, a dot represents a single character and an asterisk represents zero or more of the preceding character. So ".*" represents zero or more of any character, "a*" represents zero or more "a", "[0-9]*" represents zero or more digits. Another useful one (among many) is the plus sign which represents one or more of the preceding character. So "[a-z]+" represents one or more lowercase alpha character (in the C locale - and some others).
if [[ $gg =~ ^....grid.*$ ]] ; then echo $gg; fi
Use
=~
for regular expression check Regular Expressions Tutorial Table of Contents
if [[ $gg =~ ^....grid.* ]]
Adding this solution with grep and basic sh builtins for those interested in a more portable solution (independent of bash version; also works with plain old sh, on non-Linux platforms etc.)
# GLOB matching
gg=svm-grid-ch
case "$gg" in
*grid*) echo $gg ;;
esac
# REGEXP
if echo "$gg" | grep '^....grid*' >/dev/null ; then echo $gg ; fi
if echo "$gg" | grep '....grid*' >/dev/null ; then echo $gg ; fi
if echo "$gg" | grep 's...grid*' >/dev/null ; then echo $gg ; fi
# Extended REGEXP
if echo "$gg" | egrep '(^....grid*|....grid*|s...grid*)' >/dev/null ; then
echo $gg
fi
Some grep incarnations also support the -q (quiet) option as an alternative to redirecting to /dev/null, but the redirect is again the most portable.
#OP,
Is glob pettern not only used for file names?
No, "glob" pattern is not only used for file names. you an use it to compare strings as well. In your examples, you can use case/esac to look for strings patterns.
gg=svm-grid-ch
# looking for the word "grid" in the string $gg
case "$gg" in
*grid* ) echo "found";;
esac
# [[ $gg =~ ^....grid* ]]
case "$gg" in ????grid*) echo "found";; esac
# [[ $gg =~ s...grid* ]]
case "$gg" in s???grid*) echo "found";; esac
In bash, when to use glob pattern and when to use regular expression? Thanks!
Regex are more versatile and "convenient" than "glob patterns", however unless you are doing complex tasks that "globbing/extended globbing" cannot provide easily, then there's no need to use regex.
Regex are not supported for version of bash <3.2 (as dennis mentioned), but you can still use extended globbing (by setting extglob ). for extended globbing, see here and some simple examples here.
Update for OP: Example to find files that start with 2 characters (the dots "." means 1 char) followed by "g" using regex
eg output
$ shopt -s dotglob
$ ls -1 *
abg
degree
..g
$ for file in *; do [[ $file =~ "..g" ]] && echo $file ; done
abg
degree
..g
In the above, the files are matched because their names contain 2 characters followed by "g". (ie ..g).
The equivalent with globbing will be something like this: (look at reference for meaning of ? and * )
$ for file in ??g*; do echo $file; done
abg
degree
..g