In the below script,I want to add the file name as prefix to each output line generated by grep.
I dont know why this is not replacing the filename with $file,I am getting the $file as the prefix.can anyone help me on this
function traverse() {
for file in "$1"/*
do
if [ ! -d "${file}" ] ; then
if [ ${file: -2} == ".c" ]
then
./sed.sed "$file" > latest_log.txt #To remove all the comments
grep -nir "$2" latest_log.txt >> output.log #grep matched lines
sed -i "s/^/$file/" output.log > grepoutput3.txt #prefix filename($file here)
echo "${file} is a c file"
fi
else
traverse "${file}" "$2"
fi
done
}
function main() {
traverse "$1" "$2"
}
main "$1" "$2"
The below line should add the filename as prefix,but $file not replacing,Apart from that whole script is working fine.
sed -i "s/^/$file/" ex.txt > grepoutput3.txt
EX: search for "welcome" in all .c files of a folder.Take first_file.c is one.
FIRST_FILE.C
welcome here
/* welcome here */
//welcome here
welcome here2
Expected output
/DIR/FIRST_FILE.C:1: welcome here
/DIR/FIRST_FILE.C:4: welcome here2
with my script ,output is
1: welcome here
4: welcome here2
So,I am trying to prefix the file name (/DIR/FIRST_FILE.C) to each line.we can fetch that filename from $file,But sed is not interpreting it .
Nowhere in your script is any code that creates ex.txt. Is that intended?
To debug this, run your script under set -x near the beginning.
Also worth noting: you have tagged this question "sh" (not bash) but
if [ ${file: -2} == ".c" ]
is full of nonportable bashisms: the string equality operator is = and the ${file: -2} syntax is not understood by a POSIX sh. An equivalent sh code would use
case $file in
(*.c) ...;;
esac
For the filesystem traversal: Use find!
find FOLDER -type f -name '*.c' -exec bash yourscript.sh {} \;
In yourscript.sh you can access the filename in $1.
I could help further with implementing that script but it is pretty unclear what you are trying to do.
Related
I have a list of html files that contain a certain tag in each file as below:
<div id="myID" style="display:none">1_34876</div>
I would like to search for that tag in each file and rename each file according to the number within that tag, i.e rename the file containing the tag above to 1_34876.html
and so forth..
Is there a regex or bash command using grep or awk that can accomplish this?
So far I was able to grep each file using the following command but stuck on how to rename the files:
grep '<div id="myID" style="display:none">.*</div>' ./*.html
An additional bonus would be if the command doesn't overwrite duplicate files, e.g. if another file contains the 1_34876 tag above then the second file would be renamed as 1_34876 v2.html or something similar.
Kindly advice if this can be achieved in a way that doesn't require programming.
Many thanks indeed.
Ali
You can use the following script to achieve your goal. Note, for the script to work on macOS, you either have to install GNU grep via Homebrew, or substitute the grep call with ggrep.
The script will search the current directory and all its subdirectories for *.html files.
It will substitute only the names of the files that contain the specific tag.
For multiple files that containt the same tag, each subsicuent file apart from the first will have an identifier appended to its name. E.g., 1_234.html, 1_234_1.html, 1_234_2.html
For files that contain multiple tags, the first tag encountered will be used.
#!/bin/bash
rename_file ()
{
# Check that file name received is an existing regular file
file_name="$(realpath "${1}")"
if [ ! -f "${file_name}" ]; then
echo "No argument or non existing file or non regular file provided"
exit 1
fi
# Get the tag number. If the number does not exist, the variable tag will be
# empty. The first tag on a file will be used if there are multiple tags
# within a file.
tag="$(grep -oP -m 1 '(?<=<div id="myID" style="display:none">).*?(?=</div>)' \
-- "${file_name}")"
# Rename the file only if it contained a tag
if [ -n "${tag}" ]; then
file_path="$(dirname "${file_name}")"
# Change directory to the file's location silently
pushd "${file_path}" > /dev/null || return
# Check for multiple occurences of files with the same tag
if [ -e "${tag}.html" ]; then
counter="$(find ./ -maxdepth 1 -type f -name "${tag}.html" -o -name "${tag}_*.html" | wc -l)"
tag="${tag}_${counter}"
fi
# Rename the file
mv "${file_name}" "${tag}.html"
# Return to previous directory silently
popd > /dev/null || return
fi
}
# Necessary in order to call rename_file from find command within main
export -f rename_file
# The entry point function of the script. This function searches for all the
# html files in the directory that the script is run, and all subdirectories.
# The function calls rename_files upon each of the found files.
main ()
{
find ./ -type f -name "*.html" -exec bash -c 'rename_file "${1}"' _ {} \;
}
main
As we do not know exact content of the files and folders we need to take care about
Check that tag is unique
File with such name does not exists
Do not rename already renamed files
Steps:
Search for files with specidied tags grep -o '<div id="myID" style="display:none">.*</div>' /*.html
it will provide us output in format file_path:tag
Extract file_path using regexp [^:]+ (not : one or more symbols )
with -o (exact match) and get first occurence using head -1
Get new file name from tag using regexp ">.<" and then remowing "<>" using sed
we can use here one regexp to skip <> but it will be more complex and not so readable grep -oP ">\K.(?=<)"
Rename files , log Errors and script actions, return non zero exit code in case of issues
Unless it is really required better to use functions with local variables declaration (decalre using function fn_rename_html_files {} instead of fn_rename_html_files() {} as if we have complex logic with nesting functions you may change other functions variables and debugging will be not so easy)
function fn_rename_html_files {
typeset l_files_path="${1-.}"
typeset l_html_ext=".html"
typeset l_proc_exit_code=0
typeset l_proc_name="${FUNCNAME[0]}"
typeset l_files_to_search="${l_files_path}/*${l_html_ext}"
typeset l_tag_reg_exp='<div id="myID" style="display:none" *>.*</div>'
typeset l_matched_files=$(grep -o "$l_tag_reg_exp" $l_files_to_search | sort -u)
typeset l_prev_file_path=""
# loop through matching files
while IFS= read -r l_file_path_with_match_tag
do
l_current_file_path=$(echo "$l_file_path_with_match_tag" | egrep -o "([^:])+" | head -1 )
l_current_file_name=$(basename "$l_current_file_path")
#echo "l_current_file_path $l_current_file_path "
#echo "l_current_file_name $l_current_file_name "
l_new_file_name_from_tag=$(echo "$l_file_path_with_match_tag" | egrep -o ">.*<" | sed "s/[><]//g" | head -1 )
l_new_file_name="${l_new_file_name_from_tag}${l_html_ext}"
l_new_file_path="$l_files_path/$l_new_file_name"
#echo "$l_new_file_path $l_new_file_path "
#echo "$l_new_file_name $l_new_file_name "
if [[ "$l_prev_file_path" == "$l_current_file_path" ]]; then
echo "$l_proc_name ERROR: myID tag is not unique for $l_current_file_path, skipping second renaming " >&2
let l_proc_exit_code+=1
continue
fi
if ! [[ -f "$l_new_file_path" ]]; then
if mv "$l_current_file_path" "$l_new_file_path" ; then
echo "$l_proc_name SUCCESS: renamed $l_current_file_path to $l_new_file_path"
else
echo "$l_proc_name ERROR: cannot move $l_current_file_path to $l_new_file_path" >&2
let l_proc_exit_code+=1
fi
else
if [[ "$l_current_file_name" == "$l_new_file_name" ]]; then
echo "$l_proc_name Warning: File has been already renamed: $l_current_file_name, skipping"
else
echo "$l_proc_name Warning: File with such name already exists: $l_current_file_name, skipping" >&2
let l_proc_exit_code+=1
fi
fi
l_prev_file_path=$l_current_file_path
done <<< $l_matched_files
return $l_proc_exit_code
}
# create test files
rm *.html
echo '<div id="myID" style="display:none">1_1</div>' > 1.html ;
echo '<div id="myID" style="display:none">1_2</div>' > 2.html ;
echo '<div id="myID" style="display:none">1_3</div>' > 3.html ;
echo '<div id="myID" style="display:none">1_3_1</div>' >> 3.html ;
# run
fn_rename_html_files
I'm currently trying to get into bash regular expressions to change multiple filenames at the same time. Here are the file names:
a_001_D_xy_S37_L003_R1_001.txt
a_001_D_xy_S37_L003_R2_001.txt
a_002_D_xy_S37_L006_R1_001.txt
a_002_D_xy_S37_L006_R2_001.txt
a_003_D_xy_S23_L003_R1_001.txt
a_003_D_xy_S23_L003_R2_001.txt
I want this as my result:
a_002_D_xy_R1.txt
a_002_D_xy_R2.txt
...
I only want to change those with *001.txt at the end. First I want to remove the _S.._L00. in the filenames and the 001 in the end. I split this procedure in two parts:
for file in *001.txt;
do
echo ${file#_S.._L..6}
done
This loop already does not work. As a second alternative I tried:
for file in *001.fastq.gz;
do
echo ${file/_S.._L00./}
done
but the filenames are again unchanged. (I just use echo here to see the results. If it works I will replace it with mv ${file} ${regularexpression})
Thanks for help!
Considering that you need lots of different fields it is possibly better to just split the filename and then reconstruct it as you wish.
I suggest using an array built by splitting the original filename with _. Then you just reconstruct the new name by using the fields that you wish.
for file in *001.txt; do
echo "FILE: $file"
IFS='_' read -r -a fileFields <<< "$file"
echo "FILE FIELDS: "
for index in "${!fileFields[#]}"; do
echo "- $index ${fileFields[index]}"
done
fileName="${fileFields[0]}_${fileFields[1]}_${fileFields[2]}_${fileFields[3]}_${fileFields[-2]}.txt"
echo "NEW FILE NAME: $fileName"
# mv $file $fileName
done
The echo commands are just for debuging, you can remove them all once you understand the code.
However, if you really need to split the string using BASH expressions you can check this post:
Extracting part of a string to a variable in bash or take a look at this BASH cheat sheet.
Try to make a function, you'll first have to decide the number (n) of files.
n=$(ls *_001.txt | wc -l)
functionRename(){
for(( i=1; i <=n; i++))
do
file=$(ls *_001.txt | head -n $i | tail -n 1)
mv "${file}" "${file%_S??_*}${file#???????????????????}"
file2=$(ls *_001.txt | head -n $i | tail -n 1)
mv "${file2}" "${file2%_001*}.txt"
done
}
functionRename
This question already has answers here:
How to use sed to change file extensions?
(7 answers)
Closed 5 years ago.
If the arguments are files, I want to change their extensions to .file.
That's what I got:
#!/bin/bash
while [ $# -gt 0 ]
do
if [ -f $1 ]
then
sed -i -e "s/\(.*\)\(\.\)\(.*\)/\1\2file" $1
fi
shift
done
The script is running, but it doesn't do anything. Another problem is that file hasn't any extension, my sed command will not work, right? Please help.
sed is for manipulating the contents of files, not the filename itself.
Option 1, taken from this answer by John Smith:
filename="file.ext1"
mv "${filename}" "${filename/%ext1/ext2}"
Option 2, taken from this answer by chooban:
rename 's/\.ext/\.newext/' ./*.ext
Option 3, taken from this answer by David W.:
$ find . -name "*.ext1" -print0 | while read -d $'\0' file
do
mv $file "${file%.*}.ext2"
done
and more is here.
UPDATE : (in comment asked what % and {} doing?)
"${variable}othter_chars" > if you want expand a variable in string you can use it. and %.* in {} means take the value of variable strip off the pattern .* from the tail of the value for example if your variable be filename.txt "${variable%.*} return just filename.
Using a shell function to wrap a sed evaluate (e) command:
mvext ()
{
ext="$1";
while shift && [ "$1" ]; do
sed 's/.*/mv -iv "&" "&/
s/\(.*\)\.[^.]*$/\1/
s/.*/&\.'"${ext}"'"/e' <<< "$1";
done
}
Tests, given files bah and boo, and the extension should be .file, which is then changed to .buzz:
mvext file bah boo
mvext buzz b*.file
Output:
'bah' -> 'bah.file'
'boo' -> 'boo.file'
'bah.file' -> 'bah.buzz'
'boo.file' -> 'boo.buzz'
How it works:
The first arg is the file extension, which is stored in $ext.
The while loop parses each file name separately, since a name might include escaped spaces and whatnot. If the filenames were certain to have not such escaped spaces, the while loop could probably be avoided.
sed reads standard input, provided by a bash here string <<< "$1".
The sed code changes each name foo.bar (or even just plain foo) to the string "mv -iv foo.bar
foo.file" then runs that string with the evaluate command. The -iv options show what's been moved and prompts if an existing file might be overwritten.
I am trying to write a script which should work out like this below but somehow am not able to get the write way to put the syntax.
I have folders like S_12_O_319_K4me1.
While the contents are S_12_O_319_K4me1_S12816.sorted.bam in each folder.
So I wanted to write a script where my my script goes into my folder of the same name in a loop and then identifies the *.bam file and perform the operation, but I am unable to put the regex. This is what I tried:
#!/bin/bash
#$ -S /bin/bash
spp_run=/path/phantompeakqualtools/run_spp.R
bam_loc=/path/ChIP-Seq/output
samples="S_12_O_319_K27me3
S_12_O_319_K4me1
S_12_O_319_K4me3
S_12_O_319_K27ac"
for s in $samples; do
echo "Running SPP on $s ..."
Rscript $spp_run -c=$bam_loc/$s/${s}_S[[0-9]+\.sorted.bam -savp -out=$bam_loc/$s/${s}".run_spp.out"
done
I am not being able to recognize the digits with the above regex match.
Where am I getting it wrong?
Edit:
I tried below still it does not work, problem with parsing in the Rscript, but why will this be a problem
#!/bin/bash
#$ -S /bin/bash
spp_run=/path/tools/phantompeakqualtools/run_spp.R
bam_loc=/path/ChIP-Seq/output
samples="S_12_O_319_K27me3
S_12_O_319_K4me1
S_12_O_319_K4me3"
for s in $samples; do
echo "Running SPP on $s ..."
echo $bam_loc/$s/${s}_S*.sorted.bam
inbam=$bam_loc/$s/${s}_S*.sorted.bam
echo $inbam
Rscript $spp_run -c=$inbam -savp -out=$bam_loc/$s/${s}".run_spp.out"
done
echo "done"
Error
Error in parse.arguments(args) :
ChIP File:/path/ChIP-Seq/output/S_12_O_319_K27me3/S_12_O_319_K27me3_S*.sorted.bam does not exist
Execution halted
Does not recognize the file even though $inbam value is /path/ChIP-Seq/output/S_12_O_319_K27me3/S_12_O_319_K27me3_S12815.sorted.bam
You can use a regex in a find command :
export spp_run=/path/phantompeakqualtools/run_spp.R
export bam_loc=/path/ChIP-Seq/output
export dir
samples=(S_12_O_319_K27me3 S_12_O_319_K4me1 S_12_O_319_K4me3 S_12_O_319_K27ac)
for dir in ${samples[#]}; do
find . -type f -regex ".*/*${dir}_S[0-9]+\.sorted\.bam" \
-exec bash -c 'echo Rscript $spp_run -c=$bam_loc/${dir}/${1##*/} -savp -out=$bam_loc/${dir}/${dir}".run_spp.out"' _ {} \;
done
Note : just remove the echo before the Rscript if the output meets your needs.
I found answer to my query and below is the code. Not an elegant one but it works. I realized that the Rscript requires full name and full path so I just initialized the output of the echo command to a variable and passed it to the Rscript as input file argument and it gets a full path with full filename so now it recognizes the input file.
Not an elegant way but still it works for me.
#!/bin/bash
#$ -S /bin/bash
spp_run=/path/tools/phantompeakqualtools/run_spp.R
bam_loc=/path/ChIP-Seq/output
samples="S_12_O_319_K27me3
S_12_O_319_K4me1
S_12_O_319_K4me3"
for s in $samples; do
echo "Running SPP on $s ..."
echo $bam_loc/$s/${s}_S*.sorted.bam
inbam=$bam_loc/$s/${s}_S*.sorted.bam
echo $inbam
infile=`echo $inbam`
Rscript $spp_run -c=$infile -savp -out=$bam_loc/$s/${s}".run_spp.out"
done
echo "done"
Thanks everyone for the suggestions and comments. My code is not elegant but it is working so I put the answer here.
(I'm in a Bash environment, Cygwin on a Windows machine, with awk, sed, grep, perl, etc...)
I want to add the last folder name to the filename, just before the last underscore (_) followed by numbers or at the end if no numbers are in the filename.
Here is an example of what I have (hundreds of files needed to be reorganized) :
./aaa/A/C_17x17.p
./aaa/A/C_32x32.p
./aaa/A/C.p
./aaa/B/C_12x12.p
./aaa/B/C_4x4.p
./aaa/B/C_A_3x3.p
./aaa/B/C_X_91x91.p
./aaa/G/C_6x6.p
./aaa/G/C_7x7.p
./aaa/G/C_A_113x113.p
./aaa/G/C_A_8x8.p
./aaa/G/C_B.p
./aab/...
I would like to rename all thses files like this :
./aaa/C_A_17x17.p
./aaa/C_A_32x32.p
./aaa/C_A.p
./aaa/C_B_12x12.p
./aaa/C_B_4x4.p
./aaa/C_A_B_3x3.p
./aaa/C_X_B_91x91.p
./aaa/C_G_6x6.p
./aaa/C_G_7x7.p
./aaa/C_A_G_113x113.p
./aaa/C_A_G_8x8.p
./aaa/C_B_G.p
./aab/...
I tried many bash for loops with sed and the last one was the following :
IFS=$'\n'
for ofic in `find * -type d -name 'A'`; do
fic=`echo $ofic|sed -e 's/\/A$//'`
for ftr in `ls -b $ofic | grep -E '.png$'`; do
nfi=`echo $ftr|sed -e 's/(_\d+[x]\d+)?/_A\1/'`
echo mv \"$ofic/$ftr\" \"$fic/$nfi\"
done
done
But yet with no success... This \1 does not get inserted in the $nfi...
This is the last one I tried, only working on 1 folder (which is a subfolder of a huge folder collection) and after over 60 minutes of unsuccessful trials, I'm here with you guys.
I modified your script so that it works for all your examples.
IFS=$'\n'
for ofic in ???/?; do
IFS=/ read fic fia <<<$ofic
for ftr in `ls -b $ofic | grep -E '\.p.*$'`; do
nfi=`echo $ftr|sed -e "s/_[0-9]*x[0-9]*/_$fia&/;t;s/\./_$fia./"`
echo mv \"$ofic/$ftr\" \"$fic/$nfi\"
done
done
# it's easier to change to here first
cd aaa
# process every file
for f in $(find . -type f); do
# strips everything after the first / so this is our foldername
foldername=${f/\/*/}
# creates the new filename from substrings of the
# original filename concatenated to the foldername
newfilename=".${f:1:3}${foldername}_${f:4}"
# if you are satisfied with the output, just leave out the `echo`
# from below
echo mv ${f} ${newfilename}
done
Might work for you.
See here in action. (slightly modified, as ideone.com handles STDIN/find diferently...)