ShellScript - IF handling filename pattern - regex

I have a directory in UNIX which has a thousands of .TGZ compressed files, they follow this pattern :
01.red.something.tgz
02.red.something.tgz
03.red.anything.tgz
04.red.something.tgz
01.blue.something.tgz
02.blue.everything.tgz
03.blue.something.tgz
04.blue.something.tgz
01.yellow.something.tgz
02.yellow.blablathing.tgz
03.yellow.something.tgz
04.yellow.something.tgz
They are using a large amount from the filesystem,and i need to list them without extract the file itself. Actually it'll take some time, so i believe this shellscript will fit the need. I'm kinda new to Shellscript, i'm learning so i made this .sh
$pattern = "red"
for file in *.tgz
do
if [[ ${file} == '...${pattern}.*.tgz' ]]; then
echo" ==> ${file} match the pattern and the output dir is : out/"
tar -tf $file > ./out/$file
else
echo "${file} Doesn't match the pattern"
fi
done
But i've made something wrong in the if part,and even when the pattern is matched, i've got the 'Doesn't match the pattern' message.
I Know it's kinda simple if,but i can't understand why this fella doesn't work. I'd be thankfull if you guys can explain why this doesn't work.
Thank you.

you need to watch out for spaces when you create varibales in bash, in if there should not be ' - single quotes or " - double quotes if you want to match on regex, use: if [[ ${file} == ${regEx} ]];
Test:
$ ls *.tgz
01.red.something.tgz 01.yellow.something.tgz
$ ./t.sh
==> 01.red.something.tgz match the pattern and the output dir is : out/
01.yellow.something.tgz Doesn't match the pattern
$ cat t.sh
#!/bin/bash
pattern="red"
regEx="*.${pattern}.*.tgz"
for file in *.tgz
do
if [[ ${file} == ${regEx} ]]; then
echo " ==> ${file} match the pattern and the output dir is : out/"
#tar -tf $file > ./out/$file
else
echo "${file} Doesn't match the pattern"
fi
done

Related

How to fix if statement always returnig false when using regex

After dowloading a torrent I want to move it based on if its a movie or series. If it's a series the name contains S or s following by 2 numbers. for ex S01.
I tried a simple if command with regex but my code always get a false result.
if [[ "$1" =~ ([s|S][0-9][0-9]\w*)\b ]]; then
mv to series folder command
else
mv to movies folder command
fi
It doesnt matter what I input to the $1 parameter the if result will be false always.
Your match expression is incorrect - to match s or S, just use [sS]
if [[ "$1" =~ ([sS][0-9][0-9]*) ]]; then
echo "series"
else
echo "movie"
fi
Output:
$ bash del.sh s01
series
$ bash del.sh fff
movie
$ bash del.sh S02
series

bash compare regex expression and not the variable

I want to remove all file contain a substring in a string, if does not contain, I want to ignore it, so I use regex expression
str=9009
patt=*v[0-9]{3,}*.txt
for i in "${patt}"; do echo "$i"
if ! [[ "$i" =~ $str ]]; then rm "$i" ; fi done
but I got an error :
*v[0-9]{3,}*.txt
rm: cannot remove '*v[0-9]{3,}*.txt': No such file or directory
file name like this : mari_v9009.txt femme_v9009.txt mari_v9010.txt femme_v9010.txt
bash filename expansion does not use regular expressions. See https://www.gnu.org/software/bash/manual/bash.html#Filename-Expansion
To find files with "v followed by 3 or more digits followed by .txt" you'll have to use bash's extended pattern matching.
A demonstration:
$ shopt -s extglob
$ touch mari_v9009.txt femme_v9009.txt mari_v9010.txt femme_v9010.txt
$ touch foo_v12.txt
$ for f in *v[0-9][0-9]+([0-9]).txt; do echo "$f"; done
femme_v9009.txt
femme_v9010.txt
mari_v9009.txt
mari_v9010.txt
What you have with this pattern for i in *v[0-9]{3,}*.txt is:
first, bash performs brace expansion which results in
for i in *v[0-9]3*.txt *v[0-9]*.txt
then, the first word *v[0-9]3*.txt results in no matches, and the default behaviour of bash is to leave the pattern as a plain string. rm tries to delete the file named literally "*v[0-9]3*.txt" and that gives you the "file not found error"
next, the second word *v[0-9]*.txt gets expanded, but the expansion will include files you don't want to delete.
I missed the not from the question.
try this: within [[ ... ]], the == and != operators are a pattern-matching operators, and extended globbing is enabled by default
keep_pattern='*v[0-9][0-9]+([0-9]).txt'
for file in *; do
if [[ $file != $keep_pattern ]]; then
echo rm "$file"
fi
done
But find would be preferable here, if it's OK to descend into subdirectories:
find . -regextype posix-extended '!' -regex '.*v[0-9]{3,}\.txt' -print
# ...............................^^^
If that returns the files you expect to delete, change -print to -delete
You need to remove the quotes in the for loop. Then the filename globs will be interpreted:
for i in ${patt}; do echo "$i"
I assume that you are using Python.
I have tested your regex code, and found the * character unnecessary.
The following seems to work fine: v[0-9]{3,}.txt
Can you please elaborate some more on the issue?
Thanks,
Bren.
I just piped the error message to /dev/null. This worked for me:
#!/bin/bash
str=9009
patt=*v[0-9]{3,}*.txt
rm $(eval ls $patt 2> /dev/null | grep $str)
This is not regex, this is globbing. Take a look what gets expanded:
# echo *v[0-9]{3,}*.txt
*v[0-9]3*.txt femme_v9009.txt femme_v9010.txt mari_v9009.txt mari_v9010.txt
*v[0-9]3*.txt obvously doesn't exists. can you clarify what files are you trying to achieve with {3,} ? Otherwise live it out and it will match the kind of filenames you have specified.
http://tldp.org/LDP/abs/html/globbingref.html

Extract all text between two directory entries in a file

I have a file that documents the structure of several directories. I am trying to print the text for each directory individually. My input file looks like this:
$ cat file.txt
/bin:
file_1
file_2
file_3
/sbin:
file_a
file_b
file_c
/usr/local/bin:
doc_a
doc_b
doc_c
What I'm trying to do is print a specific section of the file based on user selection:
#!/bin/bash
PS3=$'\nMake a selection '
select dir in $(grep ':' file.txt;) do
case $REPLY in
[0-9]) echo $dir
# Need something here. Maybe a pcregrep regex?
# pcregrep '(<= $dir)*(some_fancy_regex)' file.txt
break;;
esac
done
The user is presented with menu options:
1) /bin:
2) /sbin:
3) /usr/local/bin:
Make a selection
Suppose the user chooses 2. Currently, this just prints the chosen directory on the screen. I would like to display the directory as well as the files it contains.
/sbin:
file_a
file_b
file_c
From what I've read it seems like a pcre regex would work here. I barely understand non-pcre style regex. I'm trying to wrap my brain around positive and negative lookahead & lookbehind but I really don't know what I'm doing yet. If someone could help me figure this out I would appreciate it.
All directories begin with a / and end with :
File names listed under each directory may contain:
[a-z], [A-Z],[0-9]
Literal characters . _ - [
All directory / file structures end with a blank empty line
It can be done purely in bash 4 in a single pass without using any external tool. Here is the script to solve this problem:
#!/bin/bash
# declare an associative array
declare -A dirs=()
# loop thru all lines and populate our associate array
# with dir as key and \n separated file names as value
while read -r; do
[[ -z $REPLY ]] && continue
if [[ $REPLY == *: ]]; then
d="$REPLY"
else
dirs["$d"]+=$'\n'"$REPLY"
fi
done < file.txt
# present a menu to customer and print selected dir name with file names
select dir in "${!dirs[#]}"; do
if [[ -n $dir ]]; then
printf '%s%s\n' "$dir" "${dirs[$dir]}"
break
fi
done
Output:
1) /usr/local/bin:
2) /bin:
3) /sbin:
#? 1
/usr/local/bin:
doc_a
doc_b
doc_c
and this:
1) /usr/local/bin:
2) /bin:
3) /sbin:
#? 3
/sbin:
file_a
file_b
file_c
With GNU sed and bash:
dir="/usr/local/bin:"
sed -n "/${dir//\//\/}/,/^$/{/^$/d;p}" file
With bash:
dir="/usr/local/bin:"
while IFS= read -r line; do
[[ $line == $dir ]] && switch=1
[[ $line == "" ]] && switch=0
[[ $switch == 1 ]] && echo "$line"
done < file
Output in both cases:
/usr/local/bin:
doc_a
doc_b
doc_c
Don't mistake shell for a text-processing tool, that's what awk is for. All you need are these 4 lines:
$ cat tst.sh
awk -v RS= -F'\n' -v OFS=') ' '{print NR, $1}' file.txt >&2
printf '\nMake a selection: ' >&2
IFS= read -r rsp
awk -v RS= -v nr="$rsp" 'NR==nr' file.txt
$ ./tst.sh
1) /bin:
2) /sbin:
3) /usr/local/bin:
Make a selection: 2
/sbin:
file_a
file_b
file_c
Grep is not to best tool to do this because it is line-oriented; you can't really have grep look at expressions that span multiple lines, except with some contortion – and that -z option is not specified by POSIX.
You could do it like this:
#!/bin/bash
PS3=$'\nMake a selection '
mapfile -t opts < <(grep ':' file.txt)
select dir in "${opts[#]}"; do
sed -n "\#$dir#,/^$/{/^$/q;p}" file.txt
break
done
First, I've changed your menu creation. Notice that you have a spare semicolon within the command substitution and a missing one after it; using grep like this would also break if there are spaces in the directory names. I've thus used mapfile to get all the lines containing : into an array.
Then, once I know about the directory, I use sed to print "from the directory name on until the next empty line". That would simply be
sed -n "/$dir/,/^$/p"
but this falls short on multiple fronts. First of all, the directory name can contain slashes, which trips up the / delimited addressing. We can use \%regexp% instead of /regexp/, where % can be any character; I've chosen #.
Now, we have
sed -n "\#$dir#,/^$/p"
That's almost there, but prints trailing blank lines; we suppress that by using {/^$/q;p} instead of just p, which says "if the line is blank, quit, else print it".
Sample output (edited to use a directory name with a space):
1) /bin blah:
2) /sbin:
3) /usr/local/bin:
Make a selection 1
/bin blah:
file_1
file_2
file_3
Remark: Non-GNU seds (like the one found in macOS) might complain about the two commands in curly braces; using {/^$/q;p;} instead (extra semicolon) might help.

Find a string in a file name (shell script)

I am trying to use regex to match a file name and extract only a portion of the file name. My file names have this pattern: galax_report_for_Sample11_8757.xls, and I want to extract the string Sample11 in this case. I have tried the following regex, but it does not work for me, could someone help with the correct regex?
name=galax_report_for_Sample11_8757.xls
sampleName=$([[ "$name" =~ ^[^_]+_([^_]+) ]] && echo ${BASH_REMATCH[2]})
edit:
just found this works for me:
sampleName=$([[ "$name" =~ ^[^_]+_([^_]+)_([^_]+)_([^_]+) ]] && echo ${BASH_REMATCH[3]})
In a simple case like this, where you essentially have just a list of values separated by a single instance of a separator character each, consider using cut to extract the field of interest:
sampleName=$(echo 'galax_report_for_Sample11_8757.xls' | cut -d _ -f 4)
If you're using bash or zsh or ksh, you can make it a little more efficient:
sampleName=$(cut -d _ -f 4 <<< 'galax_report_for_Sample11_8757.xls')
Here is a slightly shorter alternative to the approach you used:
sampleName=$([[ "$name" =~ ^([^_]+_){3}([^_]+) ]] && echo ${BASH_REMATCH[2]})

Bash Script sed command not working correctly with file passed through command line

Problem
As I am trying to write a script to rename massive files according to some regex requirement, the command work ok on my iTerm2 succeeds but the same command fails to do the work in the script.
Plus some of my file names includes some Chinese and Korean characters.(don't know whether that is the problem or not)
code
So My code takes three input: Old regex, New regex and the files that need to be renamed.
Here is not code:
#!/bin/bash
# we have less than 3 arguments. Print the help text:
if [ $# -lt 3 ] ; then
cat << HELP
ren -- renames a number of files using sed regular expressions USAGE: ren 'regexp'
'replacement' files...
EXAMPLE: rename all *.HTM files into *.html:
ren 'HTM' 'html' *.HTM
HELP
exit 0
fi
OLD="$1"
NEW="$2"
# The shift command removes one argument from the list of
# command line arguments.
shift
shift
# $# contains now all the files:
for file in "$#"; do
if [ -f "$file" ] ; then
newfile=`echo "$file" | sed "s/${OLD}/${NEW}/g"`
if [ -f "$newfile" ]; then
echo "ERROR: $newfile exists already"
else
echo "renaming $file to $newfile ..."
mv "$file" "$newfile"
fi
fi
done
I register the bash command in the .profile as:
alias ren="bash /pathtothefile/ren.sh"
Test
The original file name is "제01과.mp3" and I want it to become "第01课.mp3".
So with my script I use:
$ ren "제\([0-9]*\)과" "第\1课" *.mp3
And it seems that the sed in the script has not worked successfully.
But the following which is exactly the same, works to replaces the name:
$ echo "제01과.mp3" | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
Any thoughts? Thx
Print the result
I have make the following change in the script so that it could print the process information:
newfile=`echo "$file" | sed "s/${OLD}/${NEW}/g"`
echo "The ${file} is changed to ${newfile}"
And the result for my test is:
The 제01과.mp3 is changed into 제01과.mp3
ERROR: 제01과.mp3 exists already
So there is no format problem.
Updating(all done under bash 4.2.45(2), Mac OS 10.9)
Testing
As I try to execute the command from the bash directly. I mean with the for loop. There is something interesting. I first stored all the names into a files.txt file using:
$ ls | grep mp3 > files.txt
And do the sed and bla bla. While single command in bash interactive mode like:
$ file="제01과.mp3"
$ echo $file | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
gives
第01课.mp3
While in the following in the interactive mode:
files=`cat files.txt`
for file in $files
do
echo $file | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
done
gives no changes!
And by now:
echo $file
gives:
$ 제30과.mp3
(There are only 30 files)
Problem Part
And I tried the first command which worked before:
$ echo $file | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
It gives no changes as:
$ 제30과.mp3
So I create a new newfile and tried again as:
$ newfile="제30과.mp3"
$ echo $newfile | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
And it gives correctly:
$第30课.mp3
WOW ORZ... Why! Why ! Why! And I try to see whether file and newfile are the same, and of course, they are not:
if [[ $file == $new ]]; then
echo True
else
echo False
fi
gives:
False
My guess
I guess there are some encoding problems , but I have found non reference, could anyone help? Thx again.
Update 2
I seem to understand that there are a huge difference between string and the file name. To be specific, it I directly use a variable like:
file="제30과.mp3"
in the script, the sed works fine. However, if the variable was passed from the $# or set the variable like:
file=./*mp3
Then the sed fails to work. I don't know why. And btw, mac sed has no -r option and in ubuntu -r does not solve the question I mention above.
Some errors combined:
In order to use groups in a regex, you need extended regex -r in sed, -E in grep
escaping correctly is a beast :)
Example
files="제2과.mp3 제30과.mp3"
for file in $files
do
echo $file | sed -r 's/제([0-9]*)과\.mp3/第\1课.mp3/g'
done
outputs
第2课.mp3
第30课.mp3
If you are not doing this as a programming project, but want to skip ahead to the part where it just works, I found these resources listed at http://www.tldp.org/LDP/GNU-Linux-Tools-Summary/html/x4055.htm:
MMV (and MCP, MLN, ...) utilities use a specialized syntax to perform bulk file operations on paths. (http://linux.maruhn.com/sec/mmv.html)
mmv before\*after.mp3 Before\#1After.mp3
Esomaniac, a Java alternative that also works on Windows, is apparently dead (home page is parked).
rename is a perl script you can download from CPAN: https://metacpan.org/release/File-Rename
rename 's/\.JPG$/.jpg/' *.JPG