How to match the regex for the below pattern?

How to match the regex for the below pattern? - regex

I am trying to write a script which should work out like this below but somehow am not able to get the write way to put the syntax.
I have folders like S_12_O_319_K4me1.
While the contents are S_12_O_319_K4me1_S12816.sorted.bam in each folder.
So I wanted to write a script where my my script goes into my folder of the same name in a loop and then identifies the *.bam file and perform the operation, but I am unable to put the regex. This is what I tried:
#!/bin/bash
#$ -S /bin/bash
spp_run=/path/phantompeakqualtools/run_spp.R
bam_loc=/path/ChIP-Seq/output
samples="S_12_O_319_K27me3
S_12_O_319_K4me1
S_12_O_319_K4me3
S_12_O_319_K27ac"
for s in $samples; do
echo "Running SPP on $s ..."
Rscript $spp_run -c=$bam_loc/$s/${s}_S[[0-9]+\.sorted.bam -savp -out=$bam_loc/$s/${s}".run_spp.out"
done
I am not being able to recognize the digits with the above regex match.
Where am I getting it wrong?
Edit:
I tried below still it does not work, problem with parsing in the Rscript, but why will this be a problem
#!/bin/bash
#$ -S /bin/bash
spp_run=/path/tools/phantompeakqualtools/run_spp.R
bam_loc=/path/ChIP-Seq/output
samples="S_12_O_319_K27me3
S_12_O_319_K4me1
S_12_O_319_K4me3"
for s in $samples; do
echo "Running SPP on $s ..."
echo $bam_loc/$s/${s}_S*.sorted.bam
inbam=$bam_loc/$s/${s}_S*.sorted.bam
echo $inbam
Rscript $spp_run -c=$inbam -savp -out=$bam_loc/$s/${s}".run_spp.out"
done
echo "done"
Error
Error in parse.arguments(args) :
ChIP File:/path/ChIP-Seq/output/S_12_O_319_K27me3/S_12_O_319_K27me3_S*.sorted.bam does not exist
Execution halted
Does not recognize the file even though $inbam value is /path/ChIP-Seq/output/S_12_O_319_K27me3/S_12_O_319_K27me3_S12815.sorted.bam

You can use a regex in a find command :
export spp_run=/path/phantompeakqualtools/run_spp.R
export bam_loc=/path/ChIP-Seq/output
export dir
samples=(S_12_O_319_K27me3 S_12_O_319_K4me1 S_12_O_319_K4me3 S_12_O_319_K27ac)
for dir in ${samples[#]}; do
find . -type f -regex ".*/*${dir}_S[0-9]+\.sorted\.bam" \
-exec bash -c 'echo Rscript $spp_run -c=$bam_loc/${dir}/${1##*/} -savp -out=$bam_loc/${dir}/${dir}".run_spp.out"' _ {} \;
done
Note : just remove the echo before the Rscript if the output meets your needs.

I found answer to my query and below is the code. Not an elegant one but it works. I realized that the Rscript requires full name and full path so I just initialized the output of the echo command to a variable and passed it to the Rscript as input file argument and it gets a full path with full filename so now it recognizes the input file.
Not an elegant way but still it works for me.
#!/bin/bash
#$ -S /bin/bash
spp_run=/path/tools/phantompeakqualtools/run_spp.R
bam_loc=/path/ChIP-Seq/output
samples="S_12_O_319_K27me3
S_12_O_319_K4me1
S_12_O_319_K4me3"
for s in $samples; do
echo "Running SPP on $s ..."
echo $bam_loc/$s/${s}_S*.sorted.bam
inbam=$bam_loc/$s/${s}_S*.sorted.bam
echo $inbam
infile=`echo $inbam`
Rscript $spp_run -c=$infile -savp -out=$bam_loc/$s/${s}".run_spp.out"
done
echo "done"
Thanks everyone for the suggestions and comments. My code is not elegant but it is working so I put the answer here.

Related

How to rename files using certain string inside each file?

I have a list of html files that contain a certain tag in each file as below:
<div id="myID" style="display:none">1_34876</div>
I would like to search for that tag in each file and rename each file according to the number within that tag, i.e rename the file containing the tag above to 1_34876.html
and so forth..
Is there a regex or bash command using grep or awk that can accomplish this?
So far I was able to grep each file using the following command but stuck on how to rename the files:
grep '<div id="myID" style="display:none">.*</div>' ./*.html
An additional bonus would be if the command doesn't overwrite duplicate files, e.g. if another file contains the 1_34876 tag above then the second file would be renamed as 1_34876 v2.html or something similar.
Kindly advice if this can be achieved in a way that doesn't require programming.
Many thanks indeed.
Ali

You can use the following script to achieve your goal. Note, for the script to work on macOS, you either have to install GNU grep via Homebrew, or substitute the grep call with ggrep.
The script will search the current directory and all its subdirectories for *.html files.
It will substitute only the names of the files that contain the specific tag.
For multiple files that containt the same tag, each subsicuent file apart from the first will have an identifier appended to its name. E.g., 1_234.html, 1_234_1.html, 1_234_2.html
For files that contain multiple tags, the first tag encountered will be used.
#!/bin/bash
rename_file ()
{
# Check that file name received is an existing regular file
file_name="$(realpath "${1}")"
if [ ! -f "${file_name}" ]; then
echo "No argument or non existing file or non regular file provided"
exit 1
fi
# Get the tag number. If the number does not exist, the variable tag will be
# empty. The first tag on a file will be used if there are multiple tags
# within a file.
tag="$(grep -oP -m 1 '(?<=<div id="myID" style="display:none">).*?(?=</div>)' \
-- "${file_name}")"
# Rename the file only if it contained a tag
if [ -n "${tag}" ]; then
file_path="$(dirname "${file_name}")"
# Change directory to the file's location silently
pushd "${file_path}" > /dev/null || return
# Check for multiple occurences of files with the same tag
if [ -e "${tag}.html" ]; then
counter="$(find ./ -maxdepth 1 -type f -name "${tag}.html" -o -name "${tag}_*.html" | wc -l)"
tag="${tag}_${counter}"
fi
# Rename the file
mv "${file_name}" "${tag}.html"
# Return to previous directory silently
popd > /dev/null || return
fi
}
# Necessary in order to call rename_file from find command within main
export -f rename_file
# The entry point function of the script. This function searches for all the
# html files in the directory that the script is run, and all subdirectories.
# The function calls rename_files upon each of the found files.
main ()
{
find ./ -type f -name "*.html" -exec bash -c 'rename_file "${1}"' _ {} \;
}
main

As we do not know exact content of the files and folders we need to take care about
Check that tag is unique
File with such name does not exists
Do not rename already renamed files
Steps:
Search for files with specidied tags grep -o '<div id="myID" style="display:none">.*</div>' /*.html
it will provide us output in format file_path:tag
Extract file_path using regexp [^:]+ (not : one or more symbols )
with -o (exact match) and get first occurence using head -1
Get new file name from tag using regexp ">.<" and then remowing "<>" using sed
we can use here one regexp to skip <> but it will be more complex and not so readable grep -oP ">\K.(?=<)"
Rename files , log Errors and script actions, return non zero exit code in case of issues
Unless it is really required better to use functions with local variables declaration (decalre using function fn_rename_html_files {} instead of fn_rename_html_files() {} as if we have complex logic with nesting functions you may change other functions variables and debugging will be not so easy)
function fn_rename_html_files {
typeset l_files_path="${1-.}"
typeset l_html_ext=".html"
typeset l_proc_exit_code=0
typeset l_proc_name="${FUNCNAME[0]}"
typeset l_files_to_search="${l_files_path}/*${l_html_ext}"
typeset l_tag_reg_exp='<div id="myID" style="display:none" *>.*</div>'
typeset l_matched_files=$(grep -o "$l_tag_reg_exp" $l_files_to_search | sort -u)
typeset l_prev_file_path=""
# loop through matching files
while IFS= read -r l_file_path_with_match_tag
do
l_current_file_path=$(echo "$l_file_path_with_match_tag" | egrep -o "([^:])+" | head -1 )
l_current_file_name=$(basename "$l_current_file_path")
#echo "l_current_file_path $l_current_file_path "
#echo "l_current_file_name $l_current_file_name "
l_new_file_name_from_tag=$(echo "$l_file_path_with_match_tag" | egrep -o ">.*<" | sed "s/[><]//g" | head -1 )
l_new_file_name="${l_new_file_name_from_tag}${l_html_ext}"
l_new_file_path="$l_files_path/$l_new_file_name"
#echo "$l_new_file_path $l_new_file_path "
#echo "$l_new_file_name $l_new_file_name "
if [[ "$l_prev_file_path" == "$l_current_file_path" ]]; then
echo "$l_proc_name ERROR: myID tag is not unique for $l_current_file_path, skipping second renaming " >&2
let l_proc_exit_code+=1
continue
fi
if ! [[ -f "$l_new_file_path" ]]; then
if mv "$l_current_file_path" "$l_new_file_path" ; then
echo "$l_proc_name SUCCESS: renamed $l_current_file_path to $l_new_file_path"
else
echo "$l_proc_name ERROR: cannot move $l_current_file_path to $l_new_file_path" >&2
let l_proc_exit_code+=1
fi
else
if [[ "$l_current_file_name" == "$l_new_file_name" ]]; then
echo "$l_proc_name Warning: File has been already renamed: $l_current_file_name, skipping"
else
echo "$l_proc_name Warning: File with such name already exists: $l_current_file_name, skipping" >&2
let l_proc_exit_code+=1
fi
fi
l_prev_file_path=$l_current_file_path
done <<< $l_matched_files
return $l_proc_exit_code
}
# create test files
rm *.html
echo '<div id="myID" style="display:none">1_1</div>' > 1.html ;
echo '<div id="myID" style="display:none">1_2</div>' > 2.html ;
echo '<div id="myID" style="display:none">1_3</div>' > 3.html ;
echo '<div id="myID" style="display:none">1_3_1</div>' >> 3.html ;
# run
fn_rename_html_files

Source and run shell function within perl regex

The Problem
I am attempting to reuse a shell function I have defined in bash script later on in the script, within a perl cmd execution block. The call to perl cmd basically needs to to run the defined shell function after matching a piece of the regex (capture group #2). See code definitions below.
The Code
The pertinent function definition in bash shell script:
evalPS() {
PS_ARGS=$(eval 'echo -en "'${1}'"' | sed -e 's#\\\[##g' -e 's#\\\]##g')
PS_STR=$((set +x; (PS4="+.$PS_ARGS"; set -x; :) 2>&1) | cut -d':' -f1 | cut -d'.' -f2)
echo -en "${PS_STR}"
}
The definition above uses some bashisms and hacks to evaluate the users real prompt to a string.
That function needs to be called within perl in the next function:
remPS() {
# store evalPS definition
EVALPS_SOURCE=$(declare -f evalPS)
# initalize prompt
eval "$PROMPT_COMMAND" &> /dev/null
# handle args
( [[ $# -eq 0 ]] && cat - || cat "${1}" ) |
# ridiculous regex
perl -pe 's/^[^\e].*//gs' |
perl -s -0777 -e '`'"$EVALPS_SOURCE"'`; while (<>) { s%(.*?\Q$ps1\E)(?{`eval $PROMPT_COMMAND`})|(.*?\Q$ps2\E)(?{$ps2=`$\(evalPS "${PS2}"\)`})|(\Q$ps3\E)|(^\Q$ps4\E+.*?\b)%%g; } continue {print;}' -- \
-ps1="$(evalPS "${PS1}")" -ps2="$(evalPS "${PS2}")" \
-ps3="${PS3}" -ps4="${PS4:0:1}" |
perl -pe 's/(.*?)\e\[[A-Z](\a)*/\1/g'
}
The call to perl could be moved to a separate script but either way the issue is I can not find a way to "import" or "source" the remPS() function, within the perl script. I also tried sourcing the function from a separate file definition, into the perl command. Like so:
perl -s -0777 -e '`. /home/anon/Desktop/flyball_labs/scripts/recsesh_lib.sh`; while (<>) { s%(.*?\Q$ps1\E)(?{`eval $PROMPT_COMMAND`})|(.*?\Q$ps2\E)(?{$ps2=`$\(evalPS "${PS2}"\)`})|(\Q$ps3\E)|(^\Q$ps4\E+.*?\b)%%g; } continue {print;}'
...
Or using the source builtin:
perl -s -0777 -e '`source /home/anon/Desktop/flyball_labs/scripts/recsesh_lib.sh`; while (<>) { s%(.*?\Q$ps1\E)(?{`eval $PROMPT_COMMAND`})|(.*?\Q$ps2\E)(?{$ps2=`$\(evalPS "${PS2}"\)`})|(\Q$ps3\E)|(^\Q$ps4\E+.*?\b)%%g; } continue {print;}'
...
And for clarity, the final attempt was passing the function declaration into perl like so:
perl -s -0777 -e '`'"$EVALPS_SOURCE"'`; while (<>) { s%(.*?\Q$ps1\E)(?{`eval $PROMPT_COMMAND`})|(.*?\Q$ps2\E)(?{$ps2=`$\(evalPS "${PS2}"\)`})|(\Q$ps3\E)|(^\Q$ps4\E+.*?\b)%%g; } continue {print;}'
...
The Results
With no luck in any of the above cases.. It seems that the . cmd runs whereas the source cmd does not, and the syntax for passing the declaration of the function into perl may not be possible, as shown from the output of my tests:
Sourcing library definitions w/ source cmd
(1)|anon#devbox /tmp|$ remPS "${TEXT_FILE}"
sh: 1: source: not found
...
Sourcing w/ shell . cmd
(1)|anon#devbox /tmp|$ remPS "${TEXT_FILE}"
sh: 1: evalPS: not found
...
Passing declaration to perl
(1)|anon#devbox /tmp|$ remPS "${TEXT_FILE}"
sh: 3: Syntax error: ")" unexpected (expecting "}")
sh: 1: evalPS: not found
...
To summarize
Q) How to "import" and run a user defined shell command within perl?
A) 2 Possible solutions:
source the function from separate file definition
pass into perl command from bash using variable expansion
Sources & Research
Evaluating real bash prompt value:
how-to-print-current-bash-prompt echo-expanded-ps1
Note: I chose this implementation of evalPS() because using the script cmd workaround was unreliable and using call bind_variable() bash function required root privileges (effectively changing user's prompt).
Perl regex embeded code
Note: the function has to be run after every match of $PS2 to re-evaluate the new prompt and effectively match the next iteration (as it would in a real shell session). The use case I have for this is many people have (including myself) set their $PROMPT_COMMAND to iterate an integer indicating which line number (or offset from $PS1) the current line is, and displayed within $PS2.
running a shell command in perl
Sourcing shell code in perl:
how-to-run-source-command-linux-from-a-perl-script can-we-source-a-shell-script-in-perl-script sourcing-a-shell-script-from-a-perl-script
Alternatively if anyone knows how to translate my implementation of evalPS() into perl code, that would work too, but I believe this is impossible because the evaluated string is obtained using a "set hack" which as far as I know is strictly a bashism.
how-can-i-translate-a-shell-script-to-perl
Any suggestions would be much appreciated!
Edit
Some more info on the data being parsed..
The text file looks like the following (cat -A output):
^[]0;anon# - 3^G^[[1m^[[31m(^[[36m1^[[31m)|^[[33manon^[[32m#^[[34mdevbox ^[[35m/tmp^[[31m|^[[36m^[[37m$ ^[(B^[[mecho test^M$
test^M$
^[[1m^[[31m(^[[36m1^[[31m)|^[[33manon^[[32m#^[[34mdevbox ^[[35m/tmp^[[31m|^[[36m^[[37m$ ^[(B^[[mecho \^M$
^[[1m^[[31m[^[[36m2^[[31m]|^[[33m-^[[32m-^[[34m-^[[35m> ^[(B^[[m\^M$
^[[1m^[[31m[^[[36m3^[[31m]|^[[33m-^[[32m-^[[34m-^[[35m> ^[(B^[[m\^M$
^[[1m^[[31m[^[[36m4^[[31m]|^[[33m-^[[32m-^[[34m-^[[35m> ^[(B^[[mtest^M$
test^M$
^[[1m^[[31m(^[[36m1^[[31m)|^[[33manon^[[32m#^[[34mdevbox ^[[35m/tmp^[[31m|^[[36m^[[37m$ ^[(B^[[mexit^M$
exit^M$
Or similarly (less formatted):
ESC]0;anon# - 3^GESC[1mESC[31m(ESC[36m1ESC[31m)|ESC[33manonESC[32m#ESC[34mdevbox ESC[35m/tmpESC[31m|ESC[36mESC[37m$ ESC(BESC[mecho test
test
ESC[1mESC[31m(ESC[36m1ESC[31m)|ESC[33manonESC[32m#ESC[34mdevbox ESC[35m/tmpESC[31m|ESC[36mESC[37m$ ESC(BESC[mecho \
ESC[1mESC[31m[ESC[36m2ESC[31m]|ESC[33m-ESC[32m-ESC[34m-ESC[35m> ESC(BESC[m\
ESC[1mESC[31m[ESC[36m3ESC[31m]|ESC[33m-ESC[32m-ESC[34m-ESC[35m> ESC(BESC[m\
ESC[1mESC[31m[ESC[36m4ESC[31m]|ESC[33m-ESC[32m-ESC[34m-ESC[35m> ESC(BESC[mtest
test
ESC[1mESC[31m(ESC[36m1ESC[31m)|ESC[33manonESC[32m#ESC[34mdevbox ESC[35m/tmpESC[31m|ESC[36mESC[37m$ ESC(BESC[mexit
exit
My $PROMPT_COMMAND and corresponding prompts ($PS1-$PS4) for example:
PROMPT_COMMAND='TERM_LINE_NO=1'
PS1="\[$(tput bold)\]\[$(tput setaf 1)\](\[$(tput setaf 6)\]\${TERM_LINE_NO}\[$(tput setaf 1)\])|\[$(tput setaf 3)\]\u\[$(tput setaf 2)\]#\[$(tput setaf 4)\]\h \[$(tput setaf 5)\]\w\[$(tput setaf 1)\]|\[$(tput setaf 6)\]\$(parse_git_branch)\[$(tput setaf 7)\]\\$ \[$(tput sgr0)\]"
PS2="\[$(tput bold)\]\[$(tput setaf 1)\][\[$(tput setaf 6)\]\$((++TERM_LINE_NO))\[$(tput setaf 1)\]]|\[$(tput setaf 3)\]-\[$(tput setaf 2)\]-\[$(tput setaf 4)\]-\[$(tput setaf 5)\]> \[$(tput sgr0)\]"
PS3=""
PS4="+ "

The answer was to scrap this whole idea and use a better one..
Lets step back first.. Big Picture:
Goal was to make the script program output an executable shell script of the entire recorded session.
Back to Answers..
The above implementation was supposed to remove all prompts and control characters from the output of script (which is the input examples I gave) and then remove the output of each command (i.e. any line that didn't contain control characters).
Passing the evalPS function to perl to execute proved to be quite redundant and getting bash and perl to expand the parameters correctly was a nightmare..
The Final Solution
Scrapped the perl regex idea and used a combination of subshell and history redirection to grab the commands for the entire script session, while it was running.
The entire implementation looks like this:
# log cmds to script file as they are entered (unbuffered)
# spawn script cmd in subshell and wait for it to finish
wait -n
(
history -c
export HISTFILE="${SCRIPT_FILE}"
shopt -s histappend
script -q --timing="${TIME_FILE}" "${REC_FILE}"
history -a
)
...
Simple and much easier to read! :)
Hope this helps anyone trying to make their own mods to script in the future, cheers!

how to substitute the variable in shell script

In the below script,I want to add the file name as prefix to each output line generated by grep.
I dont know why this is not replacing the filename with $file,I am getting the $file as the prefix.can anyone help me on this
function traverse() {
for file in "$1"/*
do
if [ ! -d "${file}" ] ; then
if [ ${file: -2} == ".c" ]
then
./sed.sed "$file" > latest_log.txt #To remove all the comments
grep -nir "$2" latest_log.txt >> output.log #grep matched lines
sed -i "s/^/$file/" output.log > grepoutput3.txt #prefix filename($file here)
echo "${file} is a c file"
fi
else
traverse "${file}" "$2"
fi
done
}
function main() {
traverse "$1" "$2"
}
main "$1" "$2"
The below line should add the filename as prefix,but $file not replacing,Apart from that whole script is working fine.
sed -i "s/^/$file/" ex.txt > grepoutput3.txt
EX: search for "welcome" in all .c files of a folder.Take first_file.c is one.
FIRST_FILE.C
welcome here
/* welcome here */
//welcome here
welcome here2
Expected output
/DIR/FIRST_FILE.C:1: welcome here
/DIR/FIRST_FILE.C:4: welcome here2
with my script ,output is
1: welcome here
4: welcome here2
So,I am trying to prefix the file name (/DIR/FIRST_FILE.C) to each line.we can fetch that filename from $file,But sed is not interpreting it .

Nowhere in your script is any code that creates ex.txt. Is that intended?
To debug this, run your script under set -x near the beginning.
Also worth noting: you have tagged this question "sh" (not bash) but
if [ ${file: -2} == ".c" ]
is full of nonportable bashisms: the string equality operator is = and the ${file: -2} syntax is not understood by a POSIX sh. An equivalent sh code would use
case $file in
(*.c) ...;;
esac

For the filesystem traversal: Use find!
find FOLDER -type f -name '*.c' -exec bash yourscript.sh {} \;
In yourscript.sh you can access the filename in $1.
I could help further with implementing that script but it is pretty unclear what you are trying to do.

Using a variable in sed search pattern when the value of the variable contains square brackets

What I'm trying to do is check that a file has been created. The best way I can think to do this is by listing the files before hand, listing them afterwards, deleting the before list from the after list, then seeing if the after list is not zero. I ran into trouble deleting the before list from the after list. Filenames with square brackets were not being deleted from the list.
while read -r LINE
do
sed -i -- "/$LINE/d" listfilesafter.swp #without the -- I get 'sed: 1: "listfilesafter.swp": extra characters at the end of l command'
rm listfilesafter.swp--
done < listfilesbefore.swp
If I use '' then the variable doesn't get called, and the -r option on read doesn't seem to make it work like I expected. If anyone has any suggestions on alternative ways of doing this, do contribute, but I would still like to know how to use a variable in the search pattern when the value of the variable contains metacharacters. If anyone can help remove the code smell of "rm listfilesafter.swp--" then that would also be appreciated. Full code below:
cd ~/Desktop
ls >listfilesbefore.swp
#echo "balh blah" >SomeNonZeroFile.txt #comment or uncomment to test the if then statement
ls >listfilesafter.swp
sed -i -- '/listfilesafter.swp/d' listfilesafter.swp #deletes listfilesafter.swp from the list of files create after the event on line 3
while read -r LINE
do
sed -i -- "/$LINE/d" listfilesafter.swp #without the -- I get 'sed: 1: "listfilesafter.swp": extra characters at the end of l command'
rm listfilesafter.swp--
done < listfilesbefore.swp
cat listfilesafter.swp
echo "check listfiles. Enter to continue."
read dummy_variable
if [ -s listfilesafter.swp ]
then
rm listfilesbefore.swp
rm listfilesafter.swp
echo "success, the file was created"
else
rm listfilesbefore.swp
rm listfilesafter.swp
echo "failure, the file was not created"
fi

Given that you have two lists of files in sorted order (since ls lists the files in sorted order), you should probably be using a command like diff or, in this case,
comm to find the differences between the two lists of files.
If you want to know which file(s) were created, then that's the list of files (lines) in the second file that are not in the first. With no options, comm lists the lines it reads in 3 columns:
lines in the first file not in the second
lines in the second file not in the first
lines in both files
You only need the lines (file names) in the second column, and therefore you want to suppress the list of files in the first and third columns, so you'll use comm -13 to do that:
before=$(mktemp ${TMPDIR:-/tmp}/files.XXXXXX)
after=$(mktemp ${TMPDIR:-/tmp}/files.XXXXXX)
trap "rm -f $before $after; exit 1" 0 1 2 3 13 15
ls > $before
…execute command that creates file(s)…
ls > $after
comm -13 $before $after
rm -f $before $after
trap 0
Obviously, you could capture the list of files from comm in a variable for further analysis, etc.
Making sed work when the search strings contain metacharacters
I'm still confused about sed. How do I use a variable in the search pattern of sed if the value contains metacharacters? Or in this case would I be better off using something other than sed?
In the scenario you have, you're far better off not using sed, and in any case your technique is horrendously slow if there are hundreds or thousands of files in the directory (running sed once per file name is not going to be fast).
However, supposing that it was necessary to use sed and that you wanted to deal with metacharacters in the file names in the list, then you would have to escape the metacharacters (with a backslash in front). I'd probably do something like this:
sed 's/[][\/*.]/\\&/g; s%.*%/^&$/d%' listfilesbefore.swp > script.sed
sed -f script.sed listfilesafter.swp
The first script takes any metacharacters in the line (file name) and replaces it with backslash-metacharacter. In the first substitute, the [][\/*.] character class matches square brackets, two types of slashes, stars and dots. Depending on the predilections of the variant of sed you're using, you might need to protect (){} with backslashes too, but in POSIX standard sed, the {} gain metacharacter meaning when prefixed with a backslash, so they're not modified by default. The second substitute takes the possibly modified line and converts it into a 'match and delete' command. The output, therefore, is a sed script that will delete the file names found in listfilesbefore.swp. The second command applies that script to listfilesafter.swp, doing in one sed command what your outline code does with one run of sed per file name.
Using sed to generate a sed script is a powerful technique. It isn't always appropriate, but when it is, it is very useful.
Shell script demo.sh
echo "Pre-populate the directory with some random file names"
for file in $(random -n 20 -T '%W%V%C-%w%v%c%v%c-%04[0000:9999]d.txt')
do
cp /dev/null $file
done
for template in '%w%v%w(%03[000:999]d)%w%v%w.txt' \
'%w%v%w[123]%w%v%we.txt' \
'%w%v%wfile*%03[0:999]d*.txt' \
'%w%v%w%v%c\\\%d.txt' \
'%w%v%w-{%04X}-{%04X}.txt'
do
for file in $(random -n 2 -T "$template")
do
cp /dev/null "$file"
done
done
ls > listfilesbefore.swp
ls
echo
echo "Create some new files with metacharacters in the names"
for file in 'new(123)file.txt' 'new[123]file.txt' 'newfile*321*.txt' \
'newfile\\\.txt' 'newfile-{A39F}-{B77D}.txt'
do
cp /dev/null "$file"
done
ls
ls > listfilesafter.swp
echo
echo "Create sed script"
sed 's/[][\/*.]/\\&/g; s%.*%/^&$/d%' listfilesbefore.swp > script.sed
echo
cat script.sed
echo
echo "Apply it"
sed -f script.sed listfilesafter.swp
The random command I'm using is of my own devising, but it is convenient for demonstrations such as this.
Example run
Pre-populate the directory with some random file names
AIG-taral-3486.txt
COV-oipuc-9088.txt
CUG-vowan-5758.txt
FEH-ieqek-0603.txt
IUS-aaduw-7080.txt
KER-jazuc-4824.txt
MIZ-iezec-8255.txt
NIT-kupib-6873.txt
PUX-oocov-2216.txt
QAW-xonod-3937.txt
QES-wawok-4790.txt
RON-difag-1986.txt
SAD-gesug-5706.txt
SAJ-luqoj-4311.txt
TUZ-wapaw-8547.txt
VAL-zutap-8054.txt
YIP-xudeb-7397.txt
YUP-uudiv-8848.txt
ZIB-jurax-2903.txt
ZUR-xonik-8800.txt
aavfile*147*.txt
demo.sh
diman\\\7115.txt
ganur\\\8732.txt
gud-{7049}-{3103}.txt
listfilesbefore.swp
lur[123]maee.txt
rivfile*065*.txt
ueo(417)yea.txt
uoi(751)qio.txt
woi-{37E8}-{009C}.txt
xof[123]hoxe.txt
Create some new files with metacharacters in the names
AIG-taral-3486.txt
COV-oipuc-9088.txt
CUG-vowan-5758.txt
FEH-ieqek-0603.txt
IUS-aaduw-7080.txt
KER-jazuc-4824.txt
MIZ-iezec-8255.txt
NIT-kupib-6873.txt
PUX-oocov-2216.txt
QAW-xonod-3937.txt
QES-wawok-4790.txt
RON-difag-1986.txt
SAD-gesug-5706.txt
SAJ-luqoj-4311.txt
TUZ-wapaw-8547.txt
VAL-zutap-8054.txt
YIP-xudeb-7397.txt
YUP-uudiv-8848.txt
ZIB-jurax-2903.txt
ZUR-xonik-8800.txt
aavfile*147*.txt
demo.sh
diman\\\7115.txt
ganur\\\8732.txt
gud-{7049}-{3103}.txt
listfilesbefore.swp
lur[123]maee.txt
new(123)file.txt
new[123]file.txt
newfile*321*.txt
newfile-{A39F}-{B77D}.txt
newfile\\\.txt
rivfile*065*.txt
ueo(417)yea.txt
uoi(751)qio.txt
woi-{37E8}-{009C}.txt
xof[123]hoxe.txt
Create sed script
/^AIG-taral-3486\.txt$/d
/^COV-oipuc-9088\.txt$/d
/^CUG-vowan-5758\.txt$/d
/^FEH-ieqek-0603\.txt$/d
/^IUS-aaduw-7080\.txt$/d
/^KER-jazuc-4824\.txt$/d
/^MIZ-iezec-8255\.txt$/d
/^NIT-kupib-6873\.txt$/d
/^PUX-oocov-2216\.txt$/d
/^QAW-xonod-3937\.txt$/d
/^QES-wawok-4790\.txt$/d
/^RON-difag-1986\.txt$/d
/^SAD-gesug-5706\.txt$/d
/^SAJ-luqoj-4311\.txt$/d
/^TUZ-wapaw-8547\.txt$/d
/^VAL-zutap-8054\.txt$/d
/^YIP-xudeb-7397\.txt$/d
/^YUP-uudiv-8848\.txt$/d
/^ZIB-jurax-2903\.txt$/d
/^ZUR-xonik-8800\.txt$/d
/^aavfile\*147\*\.txt$/d
/^demo\.sh$/d
/^diman\\\\\\7115\.txt$/d
/^ganur\\\\\\8732\.txt$/d
/^gud-{7049}-{3103}\.txt$/d
/^listfilesbefore\.swp$/d
/^lur\[123\]maee\.txt$/d
/^rivfile\*065\*\.txt$/d
/^ueo(417)yea\.txt$/d
/^uoi(751)qio\.txt$/d
/^woi-{37E8}-{009C}\.txt$/d
/^xof\[123\]hoxe\.txt$/d
Apply it
listfilesafter.swp
new(123)file.txt
new[123]file.txt
newfile*321*.txt
newfile-{A39F}-{B77D}.txt
newfile\\\.txt

BASH: How to rename lots of file insertnig folder name in middle of filename

(I'm in a Bash environment, Cygwin on a Windows machine, with awk, sed, grep, perl, etc...)
I want to add the last folder name to the filename, just before the last underscore (_) followed by numbers or at the end if no numbers are in the filename.
Here is an example of what I have (hundreds of files needed to be reorganized) :
./aaa/A/C_17x17.p
./aaa/A/C_32x32.p
./aaa/A/C.p
./aaa/B/C_12x12.p
./aaa/B/C_4x4.p
./aaa/B/C_A_3x3.p
./aaa/B/C_X_91x91.p
./aaa/G/C_6x6.p
./aaa/G/C_7x7.p
./aaa/G/C_A_113x113.p
./aaa/G/C_A_8x8.p
./aaa/G/C_B.p
./aab/...
I would like to rename all thses files like this :
./aaa/C_A_17x17.p
./aaa/C_A_32x32.p
./aaa/C_A.p
./aaa/C_B_12x12.p
./aaa/C_B_4x4.p
./aaa/C_A_B_3x3.p
./aaa/C_X_B_91x91.p
./aaa/C_G_6x6.p
./aaa/C_G_7x7.p
./aaa/C_A_G_113x113.p
./aaa/C_A_G_8x8.p
./aaa/C_B_G.p
./aab/...
I tried many bash for loops with sed and the last one was the following :
IFS=$'\n'
for ofic in `find * -type d -name 'A'`; do
fic=`echo $ofic|sed -e 's/\/A$//'`
for ftr in `ls -b $ofic | grep -E '.png$'`; do
nfi=`echo $ftr|sed -e 's/(_\d+[x]\d+)?/_A\1/'`
echo mv \"$ofic/$ftr\" \"$fic/$nfi\"
done
done
But yet with no success... This \1 does not get inserted in the $nfi...
This is the last one I tried, only working on 1 folder (which is a subfolder of a huge folder collection) and after over 60 minutes of unsuccessful trials, I'm here with you guys.

I modified your script so that it works for all your examples.
IFS=$'\n'
for ofic in ???/?; do
IFS=/ read fic fia <<<$ofic
for ftr in `ls -b $ofic | grep -E '\.p.*$'`; do
nfi=`echo $ftr|sed -e "s/_[0-9]*x[0-9]*/_$fia&/;t;s/\./_$fia./"`
echo mv \"$ofic/$ftr\" \"$fic/$nfi\"
done
done

# it's easier to change to here first
cd aaa
# process every file
for f in $(find . -type f); do
# strips everything after the first / so this is our foldername
foldername=${f/\/*/}
# creates the new filename from substrings of the
# original filename concatenated to the foldername
newfilename=".${f:1:3}${foldername}_${f:4}"
# if you are satisfied with the output, just leave out the `echo`
# from below
echo mv ${f} ${newfilename}
done
Might work for you.
See here in action. (slightly modified, as ideone.com handles STDIN/find diferently...)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to match the regex for the below pattern? - regex

Related

How to rename files using certain string inside each file?

Source and run shell function within perl regex

how to substitute the variable in shell script

Using a variable in sed search pattern when the value of the variable contains square brackets

BASH: How to rename lots of file insertnig folder name in middle of filename

Categories

Resources