bash compare regex expression and not the variable - regex

I want to remove all file contain a substring in a string, if does not contain, I want to ignore it, so I use regex expression
str=9009
patt=*v[0-9]{3,}*.txt
for i in "${patt}"; do echo "$i"
if ! [[ "$i" =~ $str ]]; then rm "$i" ; fi done
but I got an error :
*v[0-9]{3,}*.txt
rm: cannot remove '*v[0-9]{3,}*.txt': No such file or directory
file name like this : mari_v9009.txt femme_v9009.txt mari_v9010.txt femme_v9010.txt

bash filename expansion does not use regular expressions. See https://www.gnu.org/software/bash/manual/bash.html#Filename-Expansion
To find files with "v followed by 3 or more digits followed by .txt" you'll have to use bash's extended pattern matching.
A demonstration:
$ shopt -s extglob
$ touch mari_v9009.txt femme_v9009.txt mari_v9010.txt femme_v9010.txt
$ touch foo_v12.txt
$ for f in *v[0-9][0-9]+([0-9]).txt; do echo "$f"; done
femme_v9009.txt
femme_v9010.txt
mari_v9009.txt
mari_v9010.txt
What you have with this pattern for i in *v[0-9]{3,}*.txt is:
first, bash performs brace expansion which results in
for i in *v[0-9]3*.txt *v[0-9]*.txt
then, the first word *v[0-9]3*.txt results in no matches, and the default behaviour of bash is to leave the pattern as a plain string. rm tries to delete the file named literally "*v[0-9]3*.txt" and that gives you the "file not found error"
next, the second word *v[0-9]*.txt gets expanded, but the expansion will include files you don't want to delete.
I missed the not from the question.
try this: within [[ ... ]], the == and != operators are a pattern-matching operators, and extended globbing is enabled by default
keep_pattern='*v[0-9][0-9]+([0-9]).txt'
for file in *; do
if [[ $file != $keep_pattern ]]; then
echo rm "$file"
fi
done
But find would be preferable here, if it's OK to descend into subdirectories:
find . -regextype posix-extended '!' -regex '.*v[0-9]{3,}\.txt' -print
# ...............................^^^
If that returns the files you expect to delete, change -print to -delete

You need to remove the quotes in the for loop. Then the filename globs will be interpreted:
for i in ${patt}; do echo "$i"

I assume that you are using Python.
I have tested your regex code, and found the * character unnecessary.
The following seems to work fine: v[0-9]{3,}.txt
Can you please elaborate some more on the issue?
Thanks,
Bren.

I just piped the error message to /dev/null. This worked for me:
#!/bin/bash
str=9009
patt=*v[0-9]{3,}*.txt
rm $(eval ls $patt 2> /dev/null | grep $str)

This is not regex, this is globbing. Take a look what gets expanded:
# echo *v[0-9]{3,}*.txt
*v[0-9]3*.txt femme_v9009.txt femme_v9010.txt mari_v9009.txt mari_v9010.txt
*v[0-9]3*.txt obvously doesn't exists. can you clarify what files are you trying to achieve with {3,} ? Otherwise live it out and it will match the kind of filenames you have specified.
http://tldp.org/LDP/abs/html/globbingref.html

Related

using regex to iterate over files that matches a certain pattern in bash scripts

I have a regrex pattern ([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]*, a matching string would be 11.1.1.1_to_21.1.1.1. I want to discover all files under a directory with the above pattern.
However I am not able to get it correctly using the code below. I tried to escape ( and ) by adding \ before them, but that did not work.
dir=$SCRIPT_PATH/oaa_partition/upgrade/([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]*.sql
for FILE in $dir; do
echo $FILE
done
I was only able to something like this
dir=$SCRIPT_PATH/oaa_partition/upgrade/[0-9]*_to_*.sql
for FILE in $dir; do
echo $FILE
done
Need some help on how to use the full regrex pattern ([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]* here.
Your regex is simple enough for replacing it with a bash extglob
#!/bin/bash
shopt -s extglob
glob='+(*([0-9]).)*([0-9])_to_+(*([0-9]).)*([0-9]).sql'
for file in "$SCRIPT_PATH"/oaa_partition/upgrade/$glob
do
printf '%q\n' "$file"
done
If the regex is too complex for translating it to extended globs then you can filter the files using a bash regex inside the for loop:
#!/bin/bash
regex='([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]*\.sql'
for file in "$SCRIPT_PATH"/oaa_partition/upgrade/*_to_*.sql
do
[[ $file =~ /$regex\.sql$ ]] || continue
printf '%q\n' "$file"
done
BTW, as it is, your regex could match a lot of unwanted things, for example: 0._to_..sql.
If this is enough for differentiating the targeted files from the others then you can probably just use the basic glob
[0-9]*_to_[0-9]*.sql
To fix the regex you would want to match at least 1 number before the dot, and if you go with it, a literal dot before the sql
([0-9]+\.)+[0-9]*_to_([0-9]+\.)+[0-9]*\.sql
https://regex101.com/r/5xB3Bt/1
You cannot use regular expression in for loop. It only supports glob patterns and that is not as robust as a regex.
You will have to use your regex in gnu-find command as:
find . -mindepth 1 -maxdepth 1 -regextype egrep -regex '.*/([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]*\.sql'
To loop these entries:
while IFS= read -rd '' file; do
echo "$file"
done < <(find . -mindepth 1 -maxdepth 1 -regextype egrep -regex '.*/([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]*\.sql')

Bash check whether my asterisk expansion has any matches

I use the following code to search for matching files in the current directory:
# Check for existing backups
EXISTINGFILES="./dropbox-backup-*.tar.gz"
if [[ ! -z "$EXISTINGFILES" ]]
then
echo "Found existing backups!"
# do stuff here...
else
echo "Did not find any existing backups."
fi
This method for finding matching files allows me to iterate over matches with a loop such as for f in $EXISTINGFILES, but it never detects when no matching files were found.
How can I modify my code above to detect when no matching files were found??
Use this instead:
EXISTINGFILES=`find . -type f -name 'dropbox-backup-*.tar.gz'`
Explanation:
The problem with EXISTINGFILES=./dropbox-backup-*.tar.gz, in the context of your script, is that $EXISTINGFILES will always be non-zero since you are assigning it a value ("./dropbox-backup-*.tar.gz").
In the above solution, we are first finding the file(s) and assigning it to the variable. If the file is not found, then the variable will be zero and your script will go to the else block (i.e. will indicate "no matching files were found").
You seem to be looking for nullglob. Say:
shopt -s nullglob
at the top of your script.
$ ls foobar*
ls: foobar*: No such file or directory
$ for i in foobar*; do echo $i; done # Print foobar* if doesn't find match
foobar*
$ shopt -s nullglob
$ for i in foobar*; do echo $i; done # Doesn't print anything if no match found
$
Turn existing_files into bash array and also use nullglob
shopt -s nullglob
existing_files=(./dropbox-backup-*.tar.gz)
if ((${#existing_files[#]})); then echo 'files exist'; fi
The problem is that you are quoting the asterisk both when you set the value of EXISTINGFILES and when you expand it, so that it is never treated as a wild-card pattern. I would skip the use of variable and use the pattern by itself like this:
for f in ./dropbox-backup-*.tar.gz; do
if [[ -f "$f" ]]; then
echo "Found existing backups!"
# do stuff here...
else
echo "Did not find any existing backups."
fi
break
done
If you want to store the full list of matching names, use an array:
EXISTINGFILES=( ./dropbox-backup-*.tar.gz )
if [[ -f "${EXISTINGFILES[0]}" ]]; then
By default, a pattern that matches 0 files is treated literally, which is why I use -f in the above examples: to test if the "file" (which might be the literal pattern instead) actually exists. You can modify the default behavior so that a non-matching pattern simply vanishes instead:
shopt -s nullglob
EXISTINGFILES=( ./dropbox-backup-*.tar.gz )
for f in "${EXISTINGFILES[#]}"; do
# Do stuff; there is at least one matching file if we are in
# the body of the loop
done

bash: String Operator with regular expression

I want to list all files in my home folder, and remove the # in the filename.
For example:
#.emacs# should be printed out as .emacs
This is my code
for dir in $(ls ~)
do
# trim trailing
filename="${dir#\#}"
echo ${filename}
done
But it's still showing files preceding with # though I managed the regular expression ${dir#\#} in terminal.
Can you tell me where's the smell in my code?
remove # from filename, should be :
filename="${dir//#/}"
edit: in some systems (such as Solaris), above command doesn't work, you need escape.
filename="${dir//\#/}"
The rest are fine for both cygwin and Solaris.
If you need remove all contents before #
filename="${dir##*#}"
If you need remove all contents after #
filename="${dir%%#*}"
Here are full explanation I copy and paste from bash Substring Replacement
${string/substring/replacement}
Replace first match of $substring with $replacement.
${string//substring/replacement}
Replace all matches of $substring with $replacement.
${string%substring}
Deletes shortest match of $substring from back of $string.
${string%%substring}
Deletes longest match of $substring from back of $string.
${string#substring}
Deletes shortest match of $substring from front of $string.
${string##substring}
Deletes longest match of $substring from front of $string.
Don't parse ls. You can just you bash wildcard expansion instead. Also, your use of parameter expansion is wrong, ${word#something} removes something from the prefix not the suffix. So try
#!/bin/bash
for dir in ~/*
do
# trim trailing
filename="${dir%#}"
echo "${filename}"
done
Here's a - hopefully - instructive version:
#!/usr/bin/env bash
# Make pathname expansion match files that start with '.', too.
shopt -s dotglob
# Loop over all files/dirs. in the home folder.
for f in ~/*; do
# Exit, if no files/dirs match at all (this test may
# not be necessary if `shopt -s nullglob` is in effect).
# Use -f to only match files, -d to only match dirs.
[[ -e $f ]] || break
# Remove the path component ...
filename=$(basename "$f")
# ... and then all '#' chars. from the name.
filename="${filename//#/}"
# Process result
echo "${filename}"
done
As others have noted, you should not parse ls output - direct pathname expansion of globs (wildcard patterns) is always the better choice.
shopt -s dotglob ensures that files or dirs whose name starts with . are included in pathname expansion.
Pathname expansions occurs with the path component intact, so to get the mere filename from the loop variable, basename must be applied (first), in order to strip the path component.
Probably not an issue here, but unless shopt -s nullglob is in effect (not by default), a glob that matches nothing is left untouched, so the loop is entered with an invalid filename - hence the [[ -e ... ]] test.
Hi you just echo the filename but not to rename it. So first you need to cd to home directory from scripts then rename files. Please below scripts to find file name which contain # char and removed # from file name.
#! /bin/bash
cd ~
for i in $(ls ~ )
do
if [[ "${i}" == *#* ]]
then
var=$(echo "$i" | sed 's/#//' )
printf "%s\n" "$var" #to print only
#mv "$i" "$var" #to renmae
fi
done
You didn’t state that your files have # at the start and end of filename earlier. Try something like:
for dir in ~/*; do
filename="${dir#\#}"
filename="${filename%\#}"
echo "$dir ---> ${filename}"
done
or use what BMW has shown as his first example:
for dir in ~/*; do
filename="${dir//#/}"
echo "$dir ---> ${filename}"
done
Once you are satisfied with echo’s output. You can replace that with mv.
P.S: Re-iterating what BroSlow stated. Don’t parse ls.

bash script with simple regular expression

Consider the following bash script with a simple regular expression:
for f in "$FILES"
do
echo $f
sed -i '/HTTP|RT/d' $f
done
This script shall read every file in the directory specified by FILES and remove the lines with occurrences of 'http' or 'RT' However, it seems that the OR part of the regular expression is not working. That is if I just have sed -i '/HTTP/d' $f then it will remove all lines containing HTTP but I cannot get it to remove both HTTP and RT
What must I change in my regular expression so that lines with HTTP or RT are removed?
Thanks in advance!
Two ways of doing it (at least):
Having sed understand your regex:
sed -E -i '/HTTP|RT/d' $f
Specifying each token separately:
sed -i '/HTTP/d;/RT/d' $f
Before you do anything, run with the opposite, and PRINT what you plan to DELETE:
sed -n -e '/HTTP/p' -e '/RT/p' $f
Just to be sure you are deleting only what you want to delete before actually changing the files.
"It's not a question of whether you are paranoid or not, but whether you are paranoid ENOUGH."
Well, first of all, it will process all WORDS in the FILES variable.
If you want it to do all files in the FILES directory, then you need something like this:
for f in $( find $FILES -maxdepth 1 -type f )
do
echo $f
sed -i -e '/HTTP/d' -e '/RT/d' $f
done
You just need two "-e" options to sed.

Regexp for extensions tgz, tar.gz, TGZ and TAR.GZ

Im trying to get a regexp (in bash) to identify files with only the following extensions :
tgz, tar.gz, TGZ and TAR.GZ.
I tried several ones but cant get it to work.
Im using this regexp to select only files files with those extensions to do some work with them :
if [ -f $myregexp ]; then
.....
fi
thanks.
Try this:
#!/bin/bash
# no case match
shopt -s nocasematch
matchRegex='.*\.(tgz$)|(tar\.gz$)'
for f in *
do
# display filtered files
[[ -f "$f" ]] && [[ "$f" =~ "$matchRegex" ]] && echo "$f";
done
I have found an elegant way of doing this:
shopt -s nocasematch
for file in *;
do
[[ "$file" =~ .*\.(tar.gz|tgz)$ ]] && echo $file
done
This may be good for you since you seems to want to use the if and a bash regex. The =~ operator allow to check if the pattern is matching a given expression. Also shopt -s nocasematch has to be set to perfom a case insensitive match.
Use this pattern
.*\.{1}(tgz|tar\.gz)
But how to make a regular expression case-insensitive? It depends on the language you use. In JavaScript they use /pattern/i, in which, i denotes that the search should be case-insensitive. In C# they use RegexOptions enumeration.
Depends on where you want to use this regex. If with GREP, then use egrep with -i parameter, which stands for "ignore case"
egrep -i "(\.tgz)|(\.tar\.gz)$"
Write 4 regexes, and check whether the file name matches any of them. Or write 2 case-insensitive regexes.
This way the code will be much more readable (and easier) than writing 1 regex.
You can even do it without a regex (a bit wordy though):
for f in *.[Tt][Gg][Zz] *.[Tt][Aa][Rr].[Gg][Zz]; do
echo $f
done
In bash? Use curly brackets, *.{tar.gz,tgz,TAR.GZ,TGZ} or even *.{t{ar.,}gz,T{AR.,}GZ}. Thus, ls -l *.{t{ar.,}gz,T{AR.,}GZ} on the command-line will do a detailed listing of all files with the matching extensions.