Im trying to get a regexp (in bash) to identify files with only the following extensions :
tgz, tar.gz, TGZ and TAR.GZ.
I tried several ones but cant get it to work.
Im using this regexp to select only files files with those extensions to do some work with them :
if [ -f $myregexp ]; then
.....
fi
thanks.
Try this:
#!/bin/bash
# no case match
shopt -s nocasematch
matchRegex='.*\.(tgz$)|(tar\.gz$)'
for f in *
do
# display filtered files
[[ -f "$f" ]] && [[ "$f" =~ "$matchRegex" ]] && echo "$f";
done
I have found an elegant way of doing this:
shopt -s nocasematch
for file in *;
do
[[ "$file" =~ .*\.(tar.gz|tgz)$ ]] && echo $file
done
This may be good for you since you seems to want to use the if and a bash regex. The =~ operator allow to check if the pattern is matching a given expression. Also shopt -s nocasematch has to be set to perfom a case insensitive match.
Use this pattern
.*\.{1}(tgz|tar\.gz)
But how to make a regular expression case-insensitive? It depends on the language you use. In JavaScript they use /pattern/i, in which, i denotes that the search should be case-insensitive. In C# they use RegexOptions enumeration.
Depends on where you want to use this regex. If with GREP, then use egrep with -i parameter, which stands for "ignore case"
egrep -i "(\.tgz)|(\.tar\.gz)$"
Write 4 regexes, and check whether the file name matches any of them. Or write 2 case-insensitive regexes.
This way the code will be much more readable (and easier) than writing 1 regex.
You can even do it without a regex (a bit wordy though):
for f in *.[Tt][Gg][Zz] *.[Tt][Aa][Rr].[Gg][Zz]; do
echo $f
done
In bash? Use curly brackets, *.{tar.gz,tgz,TAR.GZ,TGZ} or even *.{t{ar.,}gz,T{AR.,}GZ}. Thus, ls -l *.{t{ar.,}gz,T{AR.,}GZ} on the command-line will do a detailed listing of all files with the matching extensions.
Related
I have a regrex pattern ([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]*, a matching string would be 11.1.1.1_to_21.1.1.1. I want to discover all files under a directory with the above pattern.
However I am not able to get it correctly using the code below. I tried to escape ( and ) by adding \ before them, but that did not work.
dir=$SCRIPT_PATH/oaa_partition/upgrade/([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]*.sql
for FILE in $dir; do
echo $FILE
done
I was only able to something like this
dir=$SCRIPT_PATH/oaa_partition/upgrade/[0-9]*_to_*.sql
for FILE in $dir; do
echo $FILE
done
Need some help on how to use the full regrex pattern ([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]* here.
Your regex is simple enough for replacing it with a bash extglob
#!/bin/bash
shopt -s extglob
glob='+(*([0-9]).)*([0-9])_to_+(*([0-9]).)*([0-9]).sql'
for file in "$SCRIPT_PATH"/oaa_partition/upgrade/$glob
do
printf '%q\n' "$file"
done
If the regex is too complex for translating it to extended globs then you can filter the files using a bash regex inside the for loop:
#!/bin/bash
regex='([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]*\.sql'
for file in "$SCRIPT_PATH"/oaa_partition/upgrade/*_to_*.sql
do
[[ $file =~ /$regex\.sql$ ]] || continue
printf '%q\n' "$file"
done
BTW, as it is, your regex could match a lot of unwanted things, for example: 0._to_..sql.
If this is enough for differentiating the targeted files from the others then you can probably just use the basic glob
[0-9]*_to_[0-9]*.sql
To fix the regex you would want to match at least 1 number before the dot, and if you go with it, a literal dot before the sql
([0-9]+\.)+[0-9]*_to_([0-9]+\.)+[0-9]*\.sql
https://regex101.com/r/5xB3Bt/1
You cannot use regular expression in for loop. It only supports glob patterns and that is not as robust as a regex.
You will have to use your regex in gnu-find command as:
find . -mindepth 1 -maxdepth 1 -regextype egrep -regex '.*/([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]*\.sql'
To loop these entries:
while IFS= read -rd '' file; do
echo "$file"
done < <(find . -mindepth 1 -maxdepth 1 -regextype egrep -regex '.*/([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]*\.sql')
I want to remove all file contain a substring in a string, if does not contain, I want to ignore it, so I use regex expression
str=9009
patt=*v[0-9]{3,}*.txt
for i in "${patt}"; do echo "$i"
if ! [[ "$i" =~ $str ]]; then rm "$i" ; fi done
but I got an error :
*v[0-9]{3,}*.txt
rm: cannot remove '*v[0-9]{3,}*.txt': No such file or directory
file name like this : mari_v9009.txt femme_v9009.txt mari_v9010.txt femme_v9010.txt
bash filename expansion does not use regular expressions. See https://www.gnu.org/software/bash/manual/bash.html#Filename-Expansion
To find files with "v followed by 3 or more digits followed by .txt" you'll have to use bash's extended pattern matching.
A demonstration:
$ shopt -s extglob
$ touch mari_v9009.txt femme_v9009.txt mari_v9010.txt femme_v9010.txt
$ touch foo_v12.txt
$ for f in *v[0-9][0-9]+([0-9]).txt; do echo "$f"; done
femme_v9009.txt
femme_v9010.txt
mari_v9009.txt
mari_v9010.txt
What you have with this pattern for i in *v[0-9]{3,}*.txt is:
first, bash performs brace expansion which results in
for i in *v[0-9]3*.txt *v[0-9]*.txt
then, the first word *v[0-9]3*.txt results in no matches, and the default behaviour of bash is to leave the pattern as a plain string. rm tries to delete the file named literally "*v[0-9]3*.txt" and that gives you the "file not found error"
next, the second word *v[0-9]*.txt gets expanded, but the expansion will include files you don't want to delete.
I missed the not from the question.
try this: within [[ ... ]], the == and != operators are a pattern-matching operators, and extended globbing is enabled by default
keep_pattern='*v[0-9][0-9]+([0-9]).txt'
for file in *; do
if [[ $file != $keep_pattern ]]; then
echo rm "$file"
fi
done
But find would be preferable here, if it's OK to descend into subdirectories:
find . -regextype posix-extended '!' -regex '.*v[0-9]{3,}\.txt' -print
# ...............................^^^
If that returns the files you expect to delete, change -print to -delete
You need to remove the quotes in the for loop. Then the filename globs will be interpreted:
for i in ${patt}; do echo "$i"
I assume that you are using Python.
I have tested your regex code, and found the * character unnecessary.
The following seems to work fine: v[0-9]{3,}.txt
Can you please elaborate some more on the issue?
Thanks,
Bren.
I just piped the error message to /dev/null. This worked for me:
#!/bin/bash
str=9009
patt=*v[0-9]{3,}*.txt
rm $(eval ls $patt 2> /dev/null | grep $str)
This is not regex, this is globbing. Take a look what gets expanded:
# echo *v[0-9]{3,}*.txt
*v[0-9]3*.txt femme_v9009.txt femme_v9010.txt mari_v9009.txt mari_v9010.txt
*v[0-9]3*.txt obvously doesn't exists. can you clarify what files are you trying to achieve with {3,} ? Otherwise live it out and it will match the kind of filenames you have specified.
http://tldp.org/LDP/abs/html/globbingref.html
I have a CSV being read into a script that has the phrases:
This port supports SSLv3/TLSv1.0.
This port supports TLSv1.0/TLSv1.1/TLSv1.2.
This port supports TLSv1.2.
What I'm looking to do is setup a REGEX variable on the word/number: TLSv1.0
Then reference that variable in an IF/Then statement. The problem I'm
having is getting the regex to see the TLSv1.0. Could somebody help me
craft my BASH script to see TLSv1.0 when it's along a line that starts off with "This port supports"?
#!/bin/sh
REGEX="\TLSv1.0\"
cat filename.csv | awk -F"," '{gsub(/\"/,"",$4);print $5}' | sed s/\"//g |
while IFS=" " read pluginoutput
do
if [[ "$pluginoutput" =~ $REGEX ]]; then
.
. rest of my code
.
You can see that I'm trying to set the regex in the variable, but it just isn't working. Obviously a typo or something. Does anybody have a regex suggestion?
Thanks,
There are a lot of things wrong here. To pick some key ones:
#!/bin/sh specifies that you want your script to be interpreted with a POSIX-compliant interpreter, but doesn't specify which one. Many of these, like ash or dash, don't have [[ ]], =~, or other extensions which your code depends on. Use #!/bin/bash instead.
In REGEX="\TLSv1.0\", the "s are data, not syntax. This means that they're part of the content being searched for when you do [[ $string =~ $regex ]]. By contrast, regex=TLSv1.0, regex="TLSv1.0" or regex='TLSv1.0' will all have the identical effect, of assigning TLSv1.0 as the content of the regex variable.
That said, as a point on regex syntax, you probably want regex='TLSv1[.]0' -- that way it will only match a ., as opposed to treating the dot as a match-any-character wildcard (as it is in regular-expression syntax).
Personally, I might do something more like the following (if I had a reason to do the parsing in bash rather than to let a single egrep call process all your input):
#!/bin/bash
regex='(^|,)"?This port supports .*TLSv1[.]0.*[.]"?($|,)'
while IFS= read -r line; do
[[ $line =~ $regex ]] && echo "Found TLSv1.0 support"
done
I have downloaded a few epub files and I need to convert them to epub again so that my ebook reader can read them.
I can do conversion in batch fairly easily using R as below:
setwd('~/Downloads/pubmed')
epub.files = list.files('./',full.names = TRUE,pattern = 'epub$')
for (loop in (1:length(epub.files))) {
command = paste('ebook-convert ',
epub.files[loop],
gsub('\\.epub','.mod.epub',epub.files[loop]))
system(command)
}
But I don't know how to do it using linux bash, I don't know: i) how to assign a variable within a for-loop, and ii) how to use regular expression to replace string in bash.
Can anyone help? Thanks.
You can also use bash's parameter substitution:
for i in *.epub; do
ebook-convert ${i} ${i/%.epub/.mod.epub}
done
You can use find and sed:
cd ~/Downloads/pubmed
for f in $(find . -regex .*epub\$); do
ebook-convert $f $(echo $f | sed 's/\.epub/.mod.epub/')
done
Not sure what ebook-convert is, but if you're trying to rename the files, try the following. Paste it into a file with the .sh extension (to signify a shell script) and make sure it is executable (chmod +x your-file.sh).
#!/bin/bash
FILES=~/Downloads/pubmed/*.epub
for f in $FILES
do
# $f stores the current file name, =~ is the regex operator
# only rename non-modified epub files
if [[ ! "$f" =~ \.mod\.epub$ ]]
then
echo "Processing $f file..."
# take action on each file
mv $f "${f%.*}".mod.epub
fi
done
You will need bash version 3 or greater for regex support. This can be implemented w/o regular expression as well.
You can use GNU parallel in combination with find:
find ~/Downloads/pubmed -name '*.epub' | parallel --gnu ebook-convert {} {.}.mod.epub
It should be available on most distributions and may have speed advantages over an ordinary loop if you process a large number of files. Although speed was not part of the original question...
Consider the following bash script with a simple regular expression:
for f in "$FILES"
do
echo $f
sed -i '/HTTP|RT/d' $f
done
This script shall read every file in the directory specified by FILES and remove the lines with occurrences of 'http' or 'RT' However, it seems that the OR part of the regular expression is not working. That is if I just have sed -i '/HTTP/d' $f then it will remove all lines containing HTTP but I cannot get it to remove both HTTP and RT
What must I change in my regular expression so that lines with HTTP or RT are removed?
Thanks in advance!
Two ways of doing it (at least):
Having sed understand your regex:
sed -E -i '/HTTP|RT/d' $f
Specifying each token separately:
sed -i '/HTTP/d;/RT/d' $f
Before you do anything, run with the opposite, and PRINT what you plan to DELETE:
sed -n -e '/HTTP/p' -e '/RT/p' $f
Just to be sure you are deleting only what you want to delete before actually changing the files.
"It's not a question of whether you are paranoid or not, but whether you are paranoid ENOUGH."
Well, first of all, it will process all WORDS in the FILES variable.
If you want it to do all files in the FILES directory, then you need something like this:
for f in $( find $FILES -maxdepth 1 -type f )
do
echo $f
sed -i -e '/HTTP/d' -e '/RT/d' $f
done
You just need two "-e" options to sed.