Substring removal in bash - regex

I'm currently trying to get into bash regular expressions to change multiple filenames at the same time. Here are the file names:
a_001_D_xy_S37_L003_R1_001.txt
a_001_D_xy_S37_L003_R2_001.txt
a_002_D_xy_S37_L006_R1_001.txt
a_002_D_xy_S37_L006_R2_001.txt
a_003_D_xy_S23_L003_R1_001.txt
a_003_D_xy_S23_L003_R2_001.txt
I want this as my result:
a_002_D_xy_R1.txt
a_002_D_xy_R2.txt
...
I only want to change those with *001.txt at the end. First I want to remove the _S.._L00. in the filenames and the 001 in the end. I split this procedure in two parts:
for file in *001.txt;
do
echo ${file#_S.._L..6}
done
This loop already does not work. As a second alternative I tried:
for file in *001.fastq.gz;
do
echo ${file/_S.._L00./}
done
but the filenames are again unchanged. (I just use echo here to see the results. If it works I will replace it with mv ${file} ${regularexpression})
Thanks for help!

Considering that you need lots of different fields it is possibly better to just split the filename and then reconstruct it as you wish.
I suggest using an array built by splitting the original filename with _. Then you just reconstruct the new name by using the fields that you wish.
for file in *001.txt; do
echo "FILE: $file"
IFS='_' read -r -a fileFields <<< "$file"
echo "FILE FIELDS: "
for index in "${!fileFields[#]}"; do
echo "- $index ${fileFields[index]}"
done
fileName="${fileFields[0]}_${fileFields[1]}_${fileFields[2]}_${fileFields[3]}_${fileFields[-2]}.txt"
echo "NEW FILE NAME: $fileName"
# mv $file $fileName
done
The echo commands are just for debuging, you can remove them all once you understand the code.
However, if you really need to split the string using BASH expressions you can check this post:
Extracting part of a string to a variable in bash or take a look at this BASH cheat sheet.

Try to make a function, you'll first have to decide the number (n) of files.
n=$(ls *_001.txt | wc -l)
functionRename(){
for(( i=1; i <=n; i++))
do
file=$(ls *_001.txt | head -n $i | tail -n 1)
mv "${file}" "${file%_S??_*}${file#???????????????????}"
file2=$(ls *_001.txt | head -n $i | tail -n 1)
mv "${file2}" "${file2%_001*}.txt"
done
}
functionRename

Related

BASH find regex for arbitrary range of numbers in a large number of files

I am writing a BASH script that, among other things, copies files from one directory to another based on input arguments for the start and end dates. The filenames are of the format YYYYMMDDhhmmss.jpg, e.g. 20161230143922.jpg. I am using find ... -exec cp {} ... because there are tens of thousands of files in the source directory. The input arguments are the start and end date in the format YYYYMMDD.
I know that I can't do a simple range in the regex like ($startdate..$enddate), but I am unable to figure out how to programmatically generate a regex that would work. If I had fewer files I could simply do cp {$startdate..$enddate} destination, but alas I don't think that is feasible.
I would like to copy all files between $startdate and $enddate that fall between the hours of 0500 and 1700. This would include images like 20170102060635.jpg and 20170104131255.jpg, but not 20170103010022.jpg.
This is what I have so far:
#!/bin/bash
STARTDATE=$1
ENDDATE=$2
FILE_NAME="review-${STARTDATE}-${ENDDATE}.mp4"
if [[ -n "$STARTDATE" ]]; then
echo "STARTDATE: $STARTDATE"
else
echo "Invalid start date: '$STARTDATE'"
echo "Syntax: ./create_time_lapse_date_range.sh <startdate> <enddate>"
exit
fi
if [[ -n "$ENDDATE" ]]; then
echo "ENDDATE: $ENDDATE"
else
echo "Invalid end date: '$ENDDATE'"
echo "Syntax: ./create_time_lapse_date_range.sh <startdate> <enddate>"
exit
fi
cd ~/Desktop/test\ timelapse
# Copy relevant files to local directory
find ~/Desktop/originals -regex "???????????????" -exec cp {} ~/Desktop/test\ timelapse/ \;
# Rename files to be sequential serial numbers
find ~/Desktop/test\ timelapse -name "*.jpg" | awk 'BEGIN{ a=0 }{ printf "mv \"%s\" ~/Desktop/\"test\ timelapse/%06d.jpg\"\n", $0, a++ }' | bash
# Generate timelapse video
ffmpeg -framerate 25 -i %06d.jpg -c:v libx264 -r 25 ${FILE_NAME}
Regex isn't the best tool for dealing with numerical ranges, so you may need to consider a solution that incorporates some logic outside the regex itself. Something like this:
REGEX="([0-9]{8})([0-9]{4})"
for f in ~/Desktop/originals/*.jpg
do
if [[ $f =~ $regex ]]
then
datepart=${BASH_REMATCH[1]}
timepart=${BASH_REMATCH[2]}
#if the DATE part matches
if (( $STARTDATE <= $datepart )) && (( $datepart <= $ENDDATE ))
then
#if the TIME part matches
if [[ $timepart =~ "(0[5-9]|1[0-7])" ]]
then
# copy file ...
fi
fi
fi
done
Pure Regex Solution
If you really want a pure regex solution, this will help demonstrate the complexity. Here's a regex to find all the files in the 0500 to 1700 timeframe, for dates in January 2017: ^201701\d{2}(0[5-9]|1[0-7])\d{4}\.jpg$
Notice the regex pattern needed to match times from 0500 to 1700:
(0[5-9]|1[0-7])
It's not pretty, and that's with a hardcoded range. To deal with dynamic start and end dates, you would be building a similar pattern dynamically. It could be done, but why use regex for it?
Here's an example, showing what you would need to generate for a date range from 20161225 to 20170114:
^(201612(2[5-9]|3\d)|201701(0\d|1[0-4]))(0[5-9]|1[0-7])\d{4}\.jpg$

Recreate output of tail -n to text files

I had a bunch of bash scripts in a directory that I "backed up" doing $ tail -n +1 -- *.sh
The output of that tail is something like:
==> do_stuff.sh <==
#! /bin/bash
cd ~/my_dir
source ~/my_dir/bin/activate
python scripts/do_stuff.py
==> do_more_stuff.sh <==
#! /bin/bash
cd ~/my_dir
python scripts/do_more_stuff.py
These are all fairly simple scripts with 2-10 lines.
Given the output of that tail, I want to recreate all of the above files with the same content.
That is, I'm looking for a command that can ingest the above text and create do_stuff.sh and do_more_stuff.sh with the appropriate content.
This is more of a one-off task so I don't really need anything robust and I believe there are no big edge cases given files are simple (e.g none of the files actually contain ==> in them).
I started with trying to come up with a matching regex and it will probably look something like this (==>.*\.sh <==)(.*)(==>.*\.sh <==), but I'm stuck into actually getting it to capture filename, content and output to file.
Any ideas?
Presume your backup file is named backup.txt
perl -ne "if (/==> (\S+) <==/){open OUT,'>',$1;next}print OUT $_" backup.txt
Above version is for Windows
fixed version on *nix:
perl -ne 'if (/==> (\S+) <==/){open OUT,">",$1;next}print OUT $_' backup.txt
#!/bin/bash
while read -r line; do
if [[ $line =~ ^==\>[[:space:]](.*)[[:space:]]\<==$ ]]; then
out="${BASH_REMATCH[1]}"
continue
fi
printf "%s\n" "$line" >> "$out"
done < backup.txt
Drawback: extra blank line at the end of every created file except the last one.

Using a variable in sed search pattern when the value of the variable contains square brackets

What I'm trying to do is check that a file has been created. The best way I can think to do this is by listing the files before hand, listing them afterwards, deleting the before list from the after list, then seeing if the after list is not zero. I ran into trouble deleting the before list from the after list. Filenames with square brackets were not being deleted from the list.
while read -r LINE
do
sed -i -- "/$LINE/d" listfilesafter.swp #without the -- I get 'sed: 1: "listfilesafter.swp": extra characters at the end of l command'
rm listfilesafter.swp--
done < listfilesbefore.swp
If I use '' then the variable doesn't get called, and the -r option on read doesn't seem to make it work like I expected. If anyone has any suggestions on alternative ways of doing this, do contribute, but I would still like to know how to use a variable in the search pattern when the value of the variable contains metacharacters. If anyone can help remove the code smell of "rm listfilesafter.swp--" then that would also be appreciated. Full code below:
cd ~/Desktop
ls >listfilesbefore.swp
#echo "balh blah" >SomeNonZeroFile.txt #comment or uncomment to test the if then statement
ls >listfilesafter.swp
sed -i -- '/listfilesafter.swp/d' listfilesafter.swp #deletes listfilesafter.swp from the list of files create after the event on line 3
while read -r LINE
do
sed -i -- "/$LINE/d" listfilesafter.swp #without the -- I get 'sed: 1: "listfilesafter.swp": extra characters at the end of l command'
rm listfilesafter.swp--
done < listfilesbefore.swp
cat listfilesafter.swp
echo "check listfiles. Enter to continue."
read dummy_variable
if [ -s listfilesafter.swp ]
then
rm listfilesbefore.swp
rm listfilesafter.swp
echo "success, the file was created"
else
rm listfilesbefore.swp
rm listfilesafter.swp
echo "failure, the file was not created"
fi
Given that you have two lists of files in sorted order (since ls lists the files in sorted order), you should probably be using a command like diff or, in this case,
comm to find the differences between the two lists of files.
If you want to know which file(s) were created, then that's the list of files (lines) in the second file that are not in the first. With no options, comm lists the lines it reads in 3 columns:
lines in the first file not in the second
lines in the second file not in the first
lines in both files
You only need the lines (file names) in the second column, and therefore you want to suppress the list of files in the first and third columns, so you'll use comm -13 to do that:
before=$(mktemp ${TMPDIR:-/tmp}/files.XXXXXX)
after=$(mktemp ${TMPDIR:-/tmp}/files.XXXXXX)
trap "rm -f $before $after; exit 1" 0 1 2 3 13 15
ls > $before
…execute command that creates file(s)…
ls > $after
comm -13 $before $after
rm -f $before $after
trap 0
Obviously, you could capture the list of files from comm in a variable for further analysis, etc.
Making sed work when the search strings contain metacharacters
I'm still confused about sed. How do I use a variable in the search pattern of sed if the value contains metacharacters? Or in this case would I be better off using something other than sed?
In the scenario you have, you're far better off not using sed, and in any case your technique is horrendously slow if there are hundreds or thousands of files in the directory (running sed once per file name is not going to be fast).
However, supposing that it was necessary to use sed and that you wanted to deal with metacharacters in the file names in the list, then you would have to escape the metacharacters (with a backslash in front). I'd probably do something like this:
sed 's/[][\/*.]/\\&/g; s%.*%/^&$/d%' listfilesbefore.swp > script.sed
sed -f script.sed listfilesafter.swp
The first script takes any metacharacters in the line (file name) and replaces it with backslash-metacharacter. In the first substitute, the [][\/*.] character class matches square brackets, two types of slashes, stars and dots. Depending on the predilections of the variant of sed you're using, you might need to protect (){} with backslashes too, but in POSIX standard sed, the {} gain metacharacter meaning when prefixed with a backslash, so they're not modified by default. The second substitute takes the possibly modified line and converts it into a 'match and delete' command. The output, therefore, is a sed script that will delete the file names found in listfilesbefore.swp. The second command applies that script to listfilesafter.swp, doing in one sed command what your outline code does with one run of sed per file name.
Using sed to generate a sed script is a powerful technique. It isn't always appropriate, but when it is, it is very useful.
Shell script demo.sh
echo "Pre-populate the directory with some random file names"
for file in $(random -n 20 -T '%W%V%C-%w%v%c%v%c-%04[0000:9999]d.txt')
do
cp /dev/null $file
done
for template in '%w%v%w(%03[000:999]d)%w%v%w.txt' \
'%w%v%w[123]%w%v%we.txt' \
'%w%v%wfile*%03[0:999]d*.txt' \
'%w%v%w%v%c\\\%d.txt' \
'%w%v%w-{%04X}-{%04X}.txt'
do
for file in $(random -n 2 -T "$template")
do
cp /dev/null "$file"
done
done
ls > listfilesbefore.swp
ls
echo
echo "Create some new files with metacharacters in the names"
for file in 'new(123)file.txt' 'new[123]file.txt' 'newfile*321*.txt' \
'newfile\\\.txt' 'newfile-{A39F}-{B77D}.txt'
do
cp /dev/null "$file"
done
ls
ls > listfilesafter.swp
echo
echo "Create sed script"
sed 's/[][\/*.]/\\&/g; s%.*%/^&$/d%' listfilesbefore.swp > script.sed
echo
cat script.sed
echo
echo "Apply it"
sed -f script.sed listfilesafter.swp
The random command I'm using is of my own devising, but it is convenient for demonstrations such as this.
Example run
Pre-populate the directory with some random file names
AIG-taral-3486.txt
COV-oipuc-9088.txt
CUG-vowan-5758.txt
FEH-ieqek-0603.txt
IUS-aaduw-7080.txt
KER-jazuc-4824.txt
MIZ-iezec-8255.txt
NIT-kupib-6873.txt
PUX-oocov-2216.txt
QAW-xonod-3937.txt
QES-wawok-4790.txt
RON-difag-1986.txt
SAD-gesug-5706.txt
SAJ-luqoj-4311.txt
TUZ-wapaw-8547.txt
VAL-zutap-8054.txt
YIP-xudeb-7397.txt
YUP-uudiv-8848.txt
ZIB-jurax-2903.txt
ZUR-xonik-8800.txt
aavfile*147*.txt
demo.sh
diman\\\7115.txt
ganur\\\8732.txt
gud-{7049}-{3103}.txt
listfilesbefore.swp
lur[123]maee.txt
rivfile*065*.txt
ueo(417)yea.txt
uoi(751)qio.txt
woi-{37E8}-{009C}.txt
xof[123]hoxe.txt
Create some new files with metacharacters in the names
AIG-taral-3486.txt
COV-oipuc-9088.txt
CUG-vowan-5758.txt
FEH-ieqek-0603.txt
IUS-aaduw-7080.txt
KER-jazuc-4824.txt
MIZ-iezec-8255.txt
NIT-kupib-6873.txt
PUX-oocov-2216.txt
QAW-xonod-3937.txt
QES-wawok-4790.txt
RON-difag-1986.txt
SAD-gesug-5706.txt
SAJ-luqoj-4311.txt
TUZ-wapaw-8547.txt
VAL-zutap-8054.txt
YIP-xudeb-7397.txt
YUP-uudiv-8848.txt
ZIB-jurax-2903.txt
ZUR-xonik-8800.txt
aavfile*147*.txt
demo.sh
diman\\\7115.txt
ganur\\\8732.txt
gud-{7049}-{3103}.txt
listfilesbefore.swp
lur[123]maee.txt
new(123)file.txt
new[123]file.txt
newfile*321*.txt
newfile-{A39F}-{B77D}.txt
newfile\\\.txt
rivfile*065*.txt
ueo(417)yea.txt
uoi(751)qio.txt
woi-{37E8}-{009C}.txt
xof[123]hoxe.txt
Create sed script
/^AIG-taral-3486\.txt$/d
/^COV-oipuc-9088\.txt$/d
/^CUG-vowan-5758\.txt$/d
/^FEH-ieqek-0603\.txt$/d
/^IUS-aaduw-7080\.txt$/d
/^KER-jazuc-4824\.txt$/d
/^MIZ-iezec-8255\.txt$/d
/^NIT-kupib-6873\.txt$/d
/^PUX-oocov-2216\.txt$/d
/^QAW-xonod-3937\.txt$/d
/^QES-wawok-4790\.txt$/d
/^RON-difag-1986\.txt$/d
/^SAD-gesug-5706\.txt$/d
/^SAJ-luqoj-4311\.txt$/d
/^TUZ-wapaw-8547\.txt$/d
/^VAL-zutap-8054\.txt$/d
/^YIP-xudeb-7397\.txt$/d
/^YUP-uudiv-8848\.txt$/d
/^ZIB-jurax-2903\.txt$/d
/^ZUR-xonik-8800\.txt$/d
/^aavfile\*147\*\.txt$/d
/^demo\.sh$/d
/^diman\\\\\\7115\.txt$/d
/^ganur\\\\\\8732\.txt$/d
/^gud-{7049}-{3103}\.txt$/d
/^listfilesbefore\.swp$/d
/^lur\[123\]maee\.txt$/d
/^rivfile\*065\*\.txt$/d
/^ueo(417)yea\.txt$/d
/^uoi(751)qio\.txt$/d
/^woi-{37E8}-{009C}\.txt$/d
/^xof\[123\]hoxe\.txt$/d
Apply it
listfilesafter.swp
new(123)file.txt
new[123]file.txt
newfile*321*.txt
newfile-{A39F}-{B77D}.txt
newfile\\\.txt

Bash: Replace array value with curl result

I have a text file named raw.txt with something like the following:
T DOTTY CRONO 52/50 53/40 54/30 55/20 RESNO NETKI
U CYMON DENDU 51/50 52/40 53/30 54/20 DOGAL BEXET
V YQX KOBEV 50/50 51/40 52/30 53/20 MALOT GISTI
W VIXUN LOGSU 49/50 50/40 51/30 52/20 LIMRI XETBO
X YYT NOVEP 48/50 49/40 50/30 51/20 DINIM ELSOX
Y DOVEY 42/60 44/50 47/40 49/30 50/20 SOMAX ATSUR
Z SOORY 43/50 46/40 48/30 49/20 BEDRA NERTU
A DINIM 51/20 52/30 50/40 47/50 RONPO COLOR
B SOMAX 50/20 51/30 49/40 46/50 URTAK BANCS
C BEDRA 49/20 50/30 48/40 45/50 VODOR RAFIN
D ETIKI 48/15 48/20 49/30 47/40 44/50 BOBTU JAROM
E 46/40 43/50 42/60 DOVEY
F 45/40 42/50 41/60 JOBOC
G 43/40 41/50 40/60 SLATN
I'm reading it into an array:
while read line; do
set $line
IFS=' ' read -a array <<< "$line"
done < raw.txt
I'm trying to replace all occurrences of [A-Z]{5} with an curl result where the match of [A-Z]{5} is fed as a variable into the curl call.
First match to be replaced would be DOTTY. The call looks similar to curl -s http://example.com/api_call/DOTTY and the result is something like -55.5833 50.6333 which should replace DOTTY in the array.
I was so far unable to correctly match the desired string and feed the match into curl.
Your help is greatly appreciated.
All the best,
Chris
EDIT:
Solution
Working solution based on #Kevin extensive answer and #Floris hint about a possible carriage return in the curl result. This was indeed the case. Thank you! Combined with some tinkering on my side I now got it to work.
#!/bin/bash
while read line; do
set $line
IFS=' ' read -a array <<< "$line"
i=0
for str in ${array[#]}; do
if [[ "$str" =~ [A-Z]{5} ]]; then
curl_tmp=$(curl -s http://example.com/api_call/$str)
# cut off line break
curl=${curl_tmp/$'\r'}
# insert at given index
declare array[$i]="$curl"
fi
let i++
done
# write to file
for index in "${array[#]}"; do
echo $index
done >> $WORK_DIR/nats.txt
done < raw.txt
I didn't change anything about your script except add the matching part, since it seems that's what you're needing help on:
#!/bin/bash
while read line; do
set $line
IFS=' ' read -a array <<< "$line"
for str in ${array[#]}; do
if [[ "$str" =~ [A-Z]{5} ]]; then
echo curl "http://example.com/api_call/$str"
fi
done
done < raw.txt
EDIT: added in the url example you provided with the variable in the URI. You can do whatever you need with the fetched output by changing it to do_something "$(curl ...)"
EDIT2: Since you're wanting to maintain the bash array you create from each line, how about this:
I'm not great at bash when it comes to arrays, so I expect someone to call me out on it, but this should work.
I've left some echos there so you can see what it's doing. The shift commands are to push the array index from the current location when the regex matches. The tmp variable to hold your curl output could probably be improved, but this should get you started, I hope.
removed temporarily to avoid confusion
EDIT3: Oops the above didn't actually work. My mistake. Let me try again here.
EDIT4:
#!/bin/bash
while read line; do
set $line
IFS=' ' read -a array <<< "$line"
i=0
# echo ${array[#]} below is just so you can see it before processing. You can remove this
echo "Array before processing: ${array[#]}"
for str in ${array[#]}; do
if [[ "$str" =~ [A-Z]{5} ]]; then
# replace the echo command below with your curl command
# ie - curl="$(curl http://example.com/api_call/$str)"
curl="$(echo 1234 -1234)"
if [[ "$flag" = "1" ]]; then
array=( ${adjustedArray[#]} )
push=$(( $push + 2 ));
let i++
else
push=1
fi
adjustedArray=( ${array[#]:0:$i} ${curl[#]} ${array[#]:$(( $i + $push)):${#array[#]}} )
#echo "DEBUG adjustedArray in loop: ${adjustedArray[#]}"
flag=1;
fi
let i++
done
unset flag
echo "final: ${adjustedArray[#]}"
# do further processing here
done < raw.txt
I know there's a smarter way to do this than the above, but we're getting into areas in bash where I'm not really suited to give advice. The above should work, but I'm hoping someone can do better.
Hope it helps, anyway
ps - You should probably not use a shell script for this unless you really need to. Perl, php, or python would make the code simple and readable
Since I misread the first time:
How about just using sed?
sed "s/\([A-Z]\{5\}\)/$(echo curl http:\\/\\/example.com\\/api_call\\/\\1)/g" /tmp/raw.txt
Try that, then try removing the echo. I'm not 100% on this since I can't run it on the real domain
EDIT: And just so I'm clear, the echo is just there so you can see what it will do with the echo removed
create a file cmatch:
#!/bin/bash
while read line
do
echo $line
a=`echo $line | egrep -o '\b[A-Z]{5}\b'`
for v in $a
do
echo "doing curl to replace $v in $line"
r=`curl -s http://example.com/api_call/$v`
r1=`echo $r | xargs echo`
line=`echo $line | sed 's/'$v'/'$r1'/'`
done
done
then call it with
chmod 755 cmatch
./cmatch < inputfile.txt > outputfile.txt
It will do what you asked
Notes:
the \b before and after the [A-Z]{5} ensures that ABCDEFG (which is not a five letter word) will not match.
using egrep -o produces an array of matches
I loop over this array to allow the replacement of multiple matches in a line
I update the line for each match found using the result of the curl call
to keep code clean, I assign the result of the curl to an intermediate variable
edit Just saw the comments about arrays. I suggest to take the output of this script and convert it to an array if you want to do further manipulation...
more edits If your curl command returns a multi-line string (which would explain the error you see), you can use the new line I introduced in the script to remove the newlines (essentially stringing all the arguments together):
echo $r | xargs echo
calls echo with one line at a time as argument, and without the carriage returns. It's a fun way of getting rid of carriage returns.
#!/bin/bash
while read line;do
set -- $line
echo "second parm is $2"
echo "do your curl here"
done < afile.txt

Add specific source file types to svn recursively

I know this is an ugly script but it does the job.
What I am facing now is adding a few more extensions what would clutter the scrip even more.
How can I make it more modular?
Specifically, how can I write this long regular expression (source file extensions) on multiple lines? say one extension on each line. I guess I am doing something wrong with string concatenation but not quite sure what exactly.
Here's the original file:
#!/bin/bash
COMMAND='svn status'
XARGS='xargs'
SVN='svn add'
$COMMAND | grep -E '(\.m|\.mat|\.java|\.js|\.php|\.cpp|\.h|\.c|\.py|\.hs|\.pl|\.xml|\.html|\.sh|.\asm|\.s|\.tex|\.bib|.\Makefile|.\jpg|.\gif|.\png|.\css)'$ | awk ' { print$2 } ' | $XARGS $SVN
and here's roughly what I am aiming at
...code code
'(.\m|
\.mat|
\.js|
.
.
.
.\css)'
..more code here
Anybody?
I know this doesn't answer the question directly, but from a readability perspective, I think most developers would agree that a single-line regex is the most common way to do things and therefore the most maintainable approach. Further, I'm not sure why you're including a period in each of your extensions, this should only need to be used once.
I wrote this little script to automatically add all images to svn. You should be able to simply add extensions between the pipes in the regex to add or remove different file types. Note that it makes sure to only add files that are unrecognized by making sure each line starts with a "?" (^\?) and ends with a period (\.(extensions)$). Hope it's helpful!
#!/bin/bash
svn st | grep -E "^\?.*\.(png|jpg|jpeg|tiff|bmp|gif)$" > /tmp/svn-auto-add-img
while read output; do
FILE=$(echo $output | awk '{ print $2 }')
svn add $FILE
done < /tmp/svn-auto-add-img
exit 0
How about this:
PATTERNS="
\.foo
\.bar
\.baz"
# Put them into one list separated by or ("|").
PATTERNS=`echo $PATTERNS |sed 's/\s\+/|/g'`
$COMMAND | grep -E "($PATTERNS)"
(Note that this would not work if you put quotes around $PATTERNS in the call to echo -- echo is taking care of stripping whitespace and converting newlines to spaces for us.)
#!/bin/bash
COMMAND='svn status'
XARGS='xargs'
SVNADD='svn add'
pats=
pats+=' \.m'
pats+=' \.mat'
pats+=' \.java'
pats+=' \.js'
# add your 'or-able' sub patterns here
# build the full pattern
pattern='(';for pat in $pats;do pattern+="$pat|";done;pattern=${pattern%\|}')$'
# run grep with the generated pattern
files=$($COMMAND | grep -E ${pattern} | awk ' { print $NF } ')
if [ " $files" != " " ]
then
$COMMAND | grep -E ${pattern} | awk ' { print $NF } ' | $XARGS $SVNADD
fi