Bash variable in path regexp [duplicate] - regex

I have a folder with files named as file_1.ext...file_90.ext. I can list a range of them with the following command:
$ ls /home/rasoul/myfolder/file_{6..19}.ext
but when I want to use this command inside a bash script, it doesn't work. Here is a minimal example:
#!/bin/bash
DIR=$1
st=$2
ed=$3
FILES=`ls ${DIR}/file\_\{$st..$ed\}.ext`
for f in $FILES; do
echo $f
done
running it as,
$ bash test_script.sh /home/rasoul/myfolder 6 19
outputs the following error message:
ls: cannot access /home/rasoul/myfolder/file_{6..19}.ext: No such file or directory

Brace expansion happens before variable expansion.
(Moreover, don't parse ls output.). You could instead say:
for f in $(seq $st $ed); do
echo "${DIR}/file_${f}.ext";
done

BASH always does brace expansion before variable expansion which is why ls is looking for a file /home/rasoul/myfolder/file_{6..19}.ext.
I personally use seq when I need to expand a number range that has variables in it. You could also use eval with echo to accomplish the same thing:
eval echo {$st..$ed}
But even if you used seq in your script, ls would not iterate over your range without a loop. If you want to check if files in the range exist, I would also avoid using ls here as you will get errors for every file in the range that doesn't exist. BASH can check if a file exists using -e.
Here is a loop that would check if a file exists within the range between variables $st and $ed and print it if it does:
for n in $(seq $st $ed); do
f="${DIR}/file_$n.ext"
if [ -e $f ]; then
echo $f
fi
done

The range pattern {A..B} does not accept variables for A or B. You need constants for them.
A workaround might be to start a subshell like this:
RESULT=$(bash -c "ls {$a..$b}")

Numeric ranges have to be literal numbers, you can't put variables in there. To do it you need to use eval:
FILES=`eval "ls ${DIR}/file_{$st..$ed}.ext"`
Here's a transcript of my test (I tried it in bash 4.1.5 and 3.2.48).
imac:testdir $ touch file_{1..30}.ext
imac:testdir $ st=6
imac:testdir $ ed=20
imac:testdir $ DIR=.
imac:testdir $ FILES=`eval "ls ${DIR}/file_{$st..$ed}.ext"`
imac:testdir $ echo "$FILES"
./file_10.ext
./file_11.ext
./file_12.ext
./file_13.ext
./file_14.ext
./file_15.ext
./file_16.ext
./file_17.ext
./file_18.ext
./file_19.ext
./file_20.ext
./file_6.ext
./file_7.ext
./file_8.ext
./file_9.ext
imac:testdir $

Related

Changing file extensions with sed [duplicate]

This question already has answers here:
How to use sed to change file extensions?
(7 answers)
Closed 5 years ago.
If the arguments are files, I want to change their extensions to .file.
That's what I got:
#!/bin/bash
while [ $# -gt 0 ]
do
if [ -f $1 ]
then
sed -i -e "s/\(.*\)\(\.\)\(.*\)/\1\2file" $1
fi
shift
done
The script is running, but it doesn't do anything. Another problem is that file hasn't any extension, my sed command will not work, right? Please help.
sed is for manipulating the contents of files, not the filename itself.
Option 1, taken from this answer by John Smith:
filename="file.ext1"
mv "${filename}" "${filename/%ext1/ext2}"
Option 2, taken from this answer by chooban:
rename 's/\.ext/\.newext/' ./*.ext
Option 3, taken from this answer by David W.:
$ find . -name "*.ext1" -print0 | while read -d $'\0' file
do
mv $file "${file%.*}.ext2"
done
and more is here.
UPDATE : (in comment asked what % and {} doing?)
"${variable}othter_chars" > if you want expand a variable in string you can use it. and %.* in {} means take the value of variable strip off the pattern .* from the tail of the value for example if your variable be filename.txt "${variable%.*} return just filename.
Using a shell function to wrap a sed evaluate (e) command:
mvext ()
{
ext="$1";
while shift && [ "$1" ]; do
sed 's/.*/mv -iv "&" "&/
s/\(.*\)\.[^.]*$/\1/
s/.*/&\.'"${ext}"'"/e' <<< "$1";
done
}
Tests, given files bah and boo, and the extension should be .file, which is then changed to .buzz:
mvext file bah boo
mvext buzz b*.file
Output:
'bah' -> 'bah.file'
'boo' -> 'boo.file'
'bah.file' -> 'bah.buzz'
'boo.file' -> 'boo.buzz'
How it works:
The first arg is the file extension, which is stored in $ext.
The while loop parses each file name separately, since a name might include escaped spaces and whatnot. If the filenames were certain to have not such escaped spaces, the while loop could probably be avoided.
sed reads standard input, provided by a bash here string <<< "$1".
The sed code changes each name foo.bar (or even just plain foo) to the string "mv -iv foo.bar
foo.file" then runs that string with the evaluate command. The -iv options show what's been moved and prompts if an existing file might be overwritten.

How to match the regex for the below pattern?

I am trying to write a script which should work out like this below but somehow am not able to get the write way to put the syntax.
I have folders like S_12_O_319_K4me1.
While the contents are S_12_O_319_K4me1_S12816.sorted.bam in each folder.
So I wanted to write a script where my my script goes into my folder of the same name in a loop and then identifies the *.bam file and perform the operation, but I am unable to put the regex. This is what I tried:
#!/bin/bash
#$ -S /bin/bash
spp_run=/path/phantompeakqualtools/run_spp.R
bam_loc=/path/ChIP-Seq/output
samples="S_12_O_319_K27me3
S_12_O_319_K4me1
S_12_O_319_K4me3
S_12_O_319_K27ac"
for s in $samples; do
echo "Running SPP on $s ..."
Rscript $spp_run -c=$bam_loc/$s/${s}_S[[0-9]+\.sorted.bam -savp -out=$bam_loc/$s/${s}".run_spp.out"
done
I am not being able to recognize the digits with the above regex match.
Where am I getting it wrong?
Edit:
I tried below still it does not work, problem with parsing in the Rscript, but why will this be a problem
#!/bin/bash
#$ -S /bin/bash
spp_run=/path/tools/phantompeakqualtools/run_spp.R
bam_loc=/path/ChIP-Seq/output
samples="S_12_O_319_K27me3
S_12_O_319_K4me1
S_12_O_319_K4me3"
for s in $samples; do
echo "Running SPP on $s ..."
echo $bam_loc/$s/${s}_S*.sorted.bam
inbam=$bam_loc/$s/${s}_S*.sorted.bam
echo $inbam
Rscript $spp_run -c=$inbam -savp -out=$bam_loc/$s/${s}".run_spp.out"
done
echo "done"
Error
Error in parse.arguments(args) :
ChIP File:/path/ChIP-Seq/output/S_12_O_319_K27me3/S_12_O_319_K27me3_S*.sorted.bam does not exist
Execution halted
Does not recognize the file even though $inbam value is /path/ChIP-Seq/output/S_12_O_319_K27me3/S_12_O_319_K27me3_S12815.sorted.bam
You can use a regex in a find command :
export spp_run=/path/phantompeakqualtools/run_spp.R
export bam_loc=/path/ChIP-Seq/output
export dir
samples=(S_12_O_319_K27me3 S_12_O_319_K4me1 S_12_O_319_K4me3 S_12_O_319_K27ac)
for dir in ${samples[#]}; do
find . -type f -regex ".*/*${dir}_S[0-9]+\.sorted\.bam" \
-exec bash -c 'echo Rscript $spp_run -c=$bam_loc/${dir}/${1##*/} -savp -out=$bam_loc/${dir}/${dir}".run_spp.out"' _ {} \;
done
Note : just remove the echo before the Rscript if the output meets your needs.
I found answer to my query and below is the code. Not an elegant one but it works. I realized that the Rscript requires full name and full path so I just initialized the output of the echo command to a variable and passed it to the Rscript as input file argument and it gets a full path with full filename so now it recognizes the input file.
Not an elegant way but still it works for me.
#!/bin/bash
#$ -S /bin/bash
spp_run=/path/tools/phantompeakqualtools/run_spp.R
bam_loc=/path/ChIP-Seq/output
samples="S_12_O_319_K27me3
S_12_O_319_K4me1
S_12_O_319_K4me3"
for s in $samples; do
echo "Running SPP on $s ..."
echo $bam_loc/$s/${s}_S*.sorted.bam
inbam=$bam_loc/$s/${s}_S*.sorted.bam
echo $inbam
infile=`echo $inbam`
Rscript $spp_run -c=$infile -savp -out=$bam_loc/$s/${s}".run_spp.out"
done
echo "done"
Thanks everyone for the suggestions and comments. My code is not elegant but it is working so I put the answer here.

Recreate output of tail -n to text files

I had a bunch of bash scripts in a directory that I "backed up" doing $ tail -n +1 -- *.sh
The output of that tail is something like:
==> do_stuff.sh <==
#! /bin/bash
cd ~/my_dir
source ~/my_dir/bin/activate
python scripts/do_stuff.py
==> do_more_stuff.sh <==
#! /bin/bash
cd ~/my_dir
python scripts/do_more_stuff.py
These are all fairly simple scripts with 2-10 lines.
Given the output of that tail, I want to recreate all of the above files with the same content.
That is, I'm looking for a command that can ingest the above text and create do_stuff.sh and do_more_stuff.sh with the appropriate content.
This is more of a one-off task so I don't really need anything robust and I believe there are no big edge cases given files are simple (e.g none of the files actually contain ==> in them).
I started with trying to come up with a matching regex and it will probably look something like this (==>.*\.sh <==)(.*)(==>.*\.sh <==), but I'm stuck into actually getting it to capture filename, content and output to file.
Any ideas?
Presume your backup file is named backup.txt
perl -ne "if (/==> (\S+) <==/){open OUT,'>',$1;next}print OUT $_" backup.txt
Above version is for Windows
fixed version on *nix:
perl -ne 'if (/==> (\S+) <==/){open OUT,">",$1;next}print OUT $_' backup.txt
#!/bin/bash
while read -r line; do
if [[ $line =~ ^==\>[[:space:]](.*)[[:space:]]\<==$ ]]; then
out="${BASH_REMATCH[1]}"
continue
fi
printf "%s\n" "$line" >> "$out"
done < backup.txt
Drawback: extra blank line at the end of every created file except the last one.

Using a variable in sed search pattern when the value of the variable contains square brackets

What I'm trying to do is check that a file has been created. The best way I can think to do this is by listing the files before hand, listing them afterwards, deleting the before list from the after list, then seeing if the after list is not zero. I ran into trouble deleting the before list from the after list. Filenames with square brackets were not being deleted from the list.
while read -r LINE
do
sed -i -- "/$LINE/d" listfilesafter.swp #without the -- I get 'sed: 1: "listfilesafter.swp": extra characters at the end of l command'
rm listfilesafter.swp--
done < listfilesbefore.swp
If I use '' then the variable doesn't get called, and the -r option on read doesn't seem to make it work like I expected. If anyone has any suggestions on alternative ways of doing this, do contribute, but I would still like to know how to use a variable in the search pattern when the value of the variable contains metacharacters. If anyone can help remove the code smell of "rm listfilesafter.swp--" then that would also be appreciated. Full code below:
cd ~/Desktop
ls >listfilesbefore.swp
#echo "balh blah" >SomeNonZeroFile.txt #comment or uncomment to test the if then statement
ls >listfilesafter.swp
sed -i -- '/listfilesafter.swp/d' listfilesafter.swp #deletes listfilesafter.swp from the list of files create after the event on line 3
while read -r LINE
do
sed -i -- "/$LINE/d" listfilesafter.swp #without the -- I get 'sed: 1: "listfilesafter.swp": extra characters at the end of l command'
rm listfilesafter.swp--
done < listfilesbefore.swp
cat listfilesafter.swp
echo "check listfiles. Enter to continue."
read dummy_variable
if [ -s listfilesafter.swp ]
then
rm listfilesbefore.swp
rm listfilesafter.swp
echo "success, the file was created"
else
rm listfilesbefore.swp
rm listfilesafter.swp
echo "failure, the file was not created"
fi
Given that you have two lists of files in sorted order (since ls lists the files in sorted order), you should probably be using a command like diff or, in this case,
comm to find the differences between the two lists of files.
If you want to know which file(s) were created, then that's the list of files (lines) in the second file that are not in the first. With no options, comm lists the lines it reads in 3 columns:
lines in the first file not in the second
lines in the second file not in the first
lines in both files
You only need the lines (file names) in the second column, and therefore you want to suppress the list of files in the first and third columns, so you'll use comm -13 to do that:
before=$(mktemp ${TMPDIR:-/tmp}/files.XXXXXX)
after=$(mktemp ${TMPDIR:-/tmp}/files.XXXXXX)
trap "rm -f $before $after; exit 1" 0 1 2 3 13 15
ls > $before
…execute command that creates file(s)…
ls > $after
comm -13 $before $after
rm -f $before $after
trap 0
Obviously, you could capture the list of files from comm in a variable for further analysis, etc.
Making sed work when the search strings contain metacharacters
I'm still confused about sed. How do I use a variable in the search pattern of sed if the value contains metacharacters? Or in this case would I be better off using something other than sed?
In the scenario you have, you're far better off not using sed, and in any case your technique is horrendously slow if there are hundreds or thousands of files in the directory (running sed once per file name is not going to be fast).
However, supposing that it was necessary to use sed and that you wanted to deal with metacharacters in the file names in the list, then you would have to escape the metacharacters (with a backslash in front). I'd probably do something like this:
sed 's/[][\/*.]/\\&/g; s%.*%/^&$/d%' listfilesbefore.swp > script.sed
sed -f script.sed listfilesafter.swp
The first script takes any metacharacters in the line (file name) and replaces it with backslash-metacharacter. In the first substitute, the [][\/*.] character class matches square brackets, two types of slashes, stars and dots. Depending on the predilections of the variant of sed you're using, you might need to protect (){} with backslashes too, but in POSIX standard sed, the {} gain metacharacter meaning when prefixed with a backslash, so they're not modified by default. The second substitute takes the possibly modified line and converts it into a 'match and delete' command. The output, therefore, is a sed script that will delete the file names found in listfilesbefore.swp. The second command applies that script to listfilesafter.swp, doing in one sed command what your outline code does with one run of sed per file name.
Using sed to generate a sed script is a powerful technique. It isn't always appropriate, but when it is, it is very useful.
Shell script demo.sh
echo "Pre-populate the directory with some random file names"
for file in $(random -n 20 -T '%W%V%C-%w%v%c%v%c-%04[0000:9999]d.txt')
do
cp /dev/null $file
done
for template in '%w%v%w(%03[000:999]d)%w%v%w.txt' \
'%w%v%w[123]%w%v%we.txt' \
'%w%v%wfile*%03[0:999]d*.txt' \
'%w%v%w%v%c\\\%d.txt' \
'%w%v%w-{%04X}-{%04X}.txt'
do
for file in $(random -n 2 -T "$template")
do
cp /dev/null "$file"
done
done
ls > listfilesbefore.swp
ls
echo
echo "Create some new files with metacharacters in the names"
for file in 'new(123)file.txt' 'new[123]file.txt' 'newfile*321*.txt' \
'newfile\\\.txt' 'newfile-{A39F}-{B77D}.txt'
do
cp /dev/null "$file"
done
ls
ls > listfilesafter.swp
echo
echo "Create sed script"
sed 's/[][\/*.]/\\&/g; s%.*%/^&$/d%' listfilesbefore.swp > script.sed
echo
cat script.sed
echo
echo "Apply it"
sed -f script.sed listfilesafter.swp
The random command I'm using is of my own devising, but it is convenient for demonstrations such as this.
Example run
Pre-populate the directory with some random file names
AIG-taral-3486.txt
COV-oipuc-9088.txt
CUG-vowan-5758.txt
FEH-ieqek-0603.txt
IUS-aaduw-7080.txt
KER-jazuc-4824.txt
MIZ-iezec-8255.txt
NIT-kupib-6873.txt
PUX-oocov-2216.txt
QAW-xonod-3937.txt
QES-wawok-4790.txt
RON-difag-1986.txt
SAD-gesug-5706.txt
SAJ-luqoj-4311.txt
TUZ-wapaw-8547.txt
VAL-zutap-8054.txt
YIP-xudeb-7397.txt
YUP-uudiv-8848.txt
ZIB-jurax-2903.txt
ZUR-xonik-8800.txt
aavfile*147*.txt
demo.sh
diman\\\7115.txt
ganur\\\8732.txt
gud-{7049}-{3103}.txt
listfilesbefore.swp
lur[123]maee.txt
rivfile*065*.txt
ueo(417)yea.txt
uoi(751)qio.txt
woi-{37E8}-{009C}.txt
xof[123]hoxe.txt
Create some new files with metacharacters in the names
AIG-taral-3486.txt
COV-oipuc-9088.txt
CUG-vowan-5758.txt
FEH-ieqek-0603.txt
IUS-aaduw-7080.txt
KER-jazuc-4824.txt
MIZ-iezec-8255.txt
NIT-kupib-6873.txt
PUX-oocov-2216.txt
QAW-xonod-3937.txt
QES-wawok-4790.txt
RON-difag-1986.txt
SAD-gesug-5706.txt
SAJ-luqoj-4311.txt
TUZ-wapaw-8547.txt
VAL-zutap-8054.txt
YIP-xudeb-7397.txt
YUP-uudiv-8848.txt
ZIB-jurax-2903.txt
ZUR-xonik-8800.txt
aavfile*147*.txt
demo.sh
diman\\\7115.txt
ganur\\\8732.txt
gud-{7049}-{3103}.txt
listfilesbefore.swp
lur[123]maee.txt
new(123)file.txt
new[123]file.txt
newfile*321*.txt
newfile-{A39F}-{B77D}.txt
newfile\\\.txt
rivfile*065*.txt
ueo(417)yea.txt
uoi(751)qio.txt
woi-{37E8}-{009C}.txt
xof[123]hoxe.txt
Create sed script
/^AIG-taral-3486\.txt$/d
/^COV-oipuc-9088\.txt$/d
/^CUG-vowan-5758\.txt$/d
/^FEH-ieqek-0603\.txt$/d
/^IUS-aaduw-7080\.txt$/d
/^KER-jazuc-4824\.txt$/d
/^MIZ-iezec-8255\.txt$/d
/^NIT-kupib-6873\.txt$/d
/^PUX-oocov-2216\.txt$/d
/^QAW-xonod-3937\.txt$/d
/^QES-wawok-4790\.txt$/d
/^RON-difag-1986\.txt$/d
/^SAD-gesug-5706\.txt$/d
/^SAJ-luqoj-4311\.txt$/d
/^TUZ-wapaw-8547\.txt$/d
/^VAL-zutap-8054\.txt$/d
/^YIP-xudeb-7397\.txt$/d
/^YUP-uudiv-8848\.txt$/d
/^ZIB-jurax-2903\.txt$/d
/^ZUR-xonik-8800\.txt$/d
/^aavfile\*147\*\.txt$/d
/^demo\.sh$/d
/^diman\\\\\\7115\.txt$/d
/^ganur\\\\\\8732\.txt$/d
/^gud-{7049}-{3103}\.txt$/d
/^listfilesbefore\.swp$/d
/^lur\[123\]maee\.txt$/d
/^rivfile\*065\*\.txt$/d
/^ueo(417)yea\.txt$/d
/^uoi(751)qio\.txt$/d
/^woi-{37E8}-{009C}\.txt$/d
/^xof\[123\]hoxe\.txt$/d
Apply it
listfilesafter.swp
new(123)file.txt
new[123]file.txt
newfile*321*.txt
newfile-{A39F}-{B77D}.txt
newfile\\\.txt

Prefix and postfix elements of a bash array

I want to pre- and postfix an array in bash similar to brace expansion.
Say I have a bash array
ARRAY=( one two three )
I want to be able to pre- and postfix it like the following brace expansion
echo prefix_{one,two,three}_suffix
The best I've been able to find uses bash regex to either add a prefix or a suffix
echo ${ARRAY[#]/#/prefix_}
echo ${ARRAY[#]/%/_suffix}
but I can't find anything on how to do both at once. Potentially I could use regex captures and do something like
echo ${ARRAY[#]/.*/prefix_$1_suffix}
but it doesn't seem like captures are supported in bash variable regex substitution. I could also store a temporary array variable like
PRE=(${ARRAY[#]/#/prefix_})
echo ${PRE[#]/%/_suffix}
This is probably the best I can think of, but it still seems sub par. A final alternative is to use a for loop akin to
EXPANDED=""
for E in ${ARRAY[#]}; do
EXPANDED="prefix_${E}_suffix $EXPANDED"
done
echo $EXPANDED
but that is super ugly. I also don't know how I would get it to work if I wanted spaces anywhere the prefix suffix or array elements.
Bash brace expansion don't use regexes. The pattern used is just some shell glob, which you can find in bash manual 3.5.8.1 Pattern Matching.
Your two-step solution is cool, but it needs some quotes for whitespace safety:
ARR_PRE=("${ARRAY[#]/#/prefix_}")
echo "${ARR_PRE[#]/%/_suffix}"
You can also do it in some evil way:
eval "something $(printf 'pre_%q_suf ' "${ARRAY[#]}")"
Your last loop could be done in a whitespace-friendly way with:
EXPANDED=()
for E in "${ARRAY[#]}"; do
EXPANDED+=("prefix_${E}_suffix")
done
echo "${EXPANDED[#]}"
Prettier but essentially the same as the loop solution:
$ ARRAY=(A B C)
$ mapfile -t -d $'\0' EXPANDED < <(printf "prefix_%s_postfix\0" "${ARRAY[#]}")
$ echo "${EXPANDED[#]}"
prefix_A_postfix prefix_B_postfix prefix_C_postfix
mapfile reads rows into elements of an array. With -d $'\0' it instead reads null-delimited strings and -t omits the delimiter from the result. See help mapfile.
For arrays:
ARRAY=( one two three )
(IFS=,; eval echo prefix_\{"${ARRAY[*]}"\}_suffix)
For strings:
STRING="one two three"
eval echo prefix_\{${STRING// /,}\}_suffix
eval causes its arguments to be evaluated twice, in both cases first evaluation results in
echo prefix_{one,two,three}_suffix
and second executes it.
For array case subshell is used to avoid overwiting IFS
You can also do this in zsh:
echo ${${ARRAY[#]/#/prefix_}/%/_suffix}
Perhaps this would be the most elegant solution:
$ declare -a ARRAY=( one two three )
$ declare -p ARRAY
declare -a ARRAY=([0]="one" [1]="two" [2]="three")
$
$ IFS=$'\n' ARRAY=( $(printf 'prefix %s_suffix\n' "${ARRAY[#]}") )
$
$ declare -p ARRAY
declare -a ARRAY=([0]="prefix one_suffix" [1]="prefix two_suffix" [2]="prefix three_suffix")
$
$ printf '%s\n' "${ARRAY[#]}"
prefix one_suffix
prefix two_suffix
prefix three_suffix
$
By using IFS=$'\n' in front of the array reassignment (being valid only for this assignment line), it is possible to preserve spaces in both prefix & suffix as well as array element strings.
Using "printf" is rather handy, because it allows to apply the format string (1st argument) to each additional string argument supplied to the call of "printf".
I have exactly the same question, and I come up with the following solution using sed's word boundary match mechanism:
myarray=( one two three )
newarray=( $(echo ${myarray[*]}|sed "s/\(\b[^ ]\+\)/pre-\1-post/g") )
echo ${newarray[#]}
> pre-one-post pre-two-post pre-three-post
echo ${#newarray[#]}
> 3
Waiting for more elegant solutions...