Iterating over directory with specified path in Bash - regex

pathToBins=$1
bins="${pathToBins}contigs.fa.metabat-bins-*"
for fileName in $bins
do
echo $fileName
done
My goal is to attach a path to my file name. I can iterate over a folder and get the file name when I don't attach the path. My challenge is when I add the path echo fileName my regular expression no longer works and I get "/home/erikrasmussen/Desktop/Script/realLargeMetaBatBinscontigs.fa.metabat-bins-*" where the regular expression '*' is treated like a string. How can I get the path and also the full file name while iterating over a folder of files?

Although I don't really know how your files are arranged on your hard drive, a casual glance at "/home/erikrasmussen/Desktop/Script/realLargeMetaBatBinscontigs.fa.metabat-bins-*" suggests that it is missing a / before contigs. If that is the case, then you should change your definition of bins to:
bins="${pathToBins}/contigs.fa.metabat-bins-*"
However, it is much more robust to use bash arrays instead of relying on filenames to not include whitespace and metacharacters. So I would suggest:
bins=(${pathToBins}/contigs.fa.metabat-bins-*)
for fileName in "${bins[#]}"
do
echo "$fileName"
done
Bash normally does not expand a pattern which doesn't match any file, so in that case you will see the original pattern. If you use the array formulation above, you could set the bash option nullglob, which will cause the unmatched pattern to vanish instead, leaving an empty array.

Related

How to update path in Perl script in multiple files

I am working on creating some training material where I am using perl. One of the things I want to do is have the scripts be set up for the student correctly, regardless of where they extra the compressed files. I am working on a Windows batch file that will copy the perl templates to the working location and then update path in the copy of the perl template files to the correct location. The perl template have this as the first line:
#!_BASE_software/perl/bin/perl.exe
The batch file looks like this:
SET TRAINING=%~dp0
copy %TRAINING%\template\*.pl %TRAINING%work
%TRAINING%software\perl\bin\perl -pi.bak -e 's/_BASE_/%TRAINING%/g' %TRAINING%work\*.pl
I have a few problems with this:
Perl doesn't seem to like the wildcard in the filename
It turns out that %TRAINING% is going to expand into a string with backslashes which need to be converted into forwardslashes and needs to be escaped within the regex.
How do I fix this?
First of all, Windows doesn't use the shebang line, so I'm not sure why you're doing any of this work in the first place.
Perl will read the shebang line and look for options if perl is found in the path, even on Windows, but that means that #!perl is sufficient if you want to pass options via the shebang line (e.g. #!perl -n).
Now, it's possible that you use Cygwin, MSYS or some other unix emulation instead of Windows to run the program, but you are placing a Windows path in the shebang line (C:...) rather than a unix path, so that doesn't make sense either.
There are three additional problems with the attempt:
cmd uses double-quotes for quoting.
cmd doesn't perform wildcard expansion like sh, so it's up to your program do it.
You are trying to generate Perl code from cmd. ouch.
If we go ahead, we get:
"%TRAINING%software\perl\bin\perl" -MFile::DosGlob=glob -pe"BEGIN { #ARGV = map glob, #ARGV; $base = $ENV{TRAINING} =~ s{\\}{/}rg } s/_BASE_/$base/g" -i.bak -- %TRAINING%work\*.pl
If we add line breaks for readability, we get the following (that cmd won't accept):
"%TRAINING%software\perl\bin\perl"
-MFile::DosGlob=glob
-pe"
BEGIN {
#ARGV = map glob, #ARGV;
$base = $ENV{TRAINING} =~ s{\\}{/}rg
}
s/_BASE_/$base/g
"
-i.bak -- %TRAINING%work\*.pl

Mass rename in shell script

I have a bunch of files which are of this format:
blabla.log.YYYY.MM.DD
Where YYYY.MM.DD is something like (2016.01.18)
I have quite a few folders with about 1000 files in each, so I wanted to have a simple script to rename them. I want to rename them to
blabla.log
So basically, I'm just stripping the date at the end. Here is what I have:
for f in [a-zA-Z]*.log.[0-9][0-9][0-9][0-9].[0-9][0-9].[0-9][0-9]; do
mv -v $f ${f#[0-9][0-9][0-9][0-9].[0-9][0-9].[0-9][0-9]};
done
This script outputs this:
mv: `blabla.log.2016.01.18' and `blabla.log.2016.01.18' are the same file
For more information:
I'm on windows, but I run this script in gitbash
For some reason, my gitbash doesn't recognize the "rename" command
Some regex patterns (like [0-9]{4} don't seem to work)
I'm really at a lost. Thanks.
EDIT: I need to rename every single file that has a date at the end and that is of the from: *.log.2016.01.18. They all need to keep their original names. All that should change is the removal of the date.
You have to use % instead of #: you want to remove from the end, not the start of your string.
Also, you're missing a . in what has to be removed, you don't want to end up with blabla.log..
Quoting the variable names prevents surprises when file names contain special characters.
Together:
mv -v "$f" "${f%.[0-9][0-9][0-9][0-9].[0-9][0-9].[0-9][0-9]}"

bulk file renaming in bash, to remove name with spaces, leaving trailing digits

Can a bash/shell expert help me in this? Each time I use PDF to split large pdf file (say its name is X.pdf) into separate pages, where each page is one pdf file, it creates files with this pattern
"X 1.pdf"
"X 2.pdf"
"X 3.pdf" etc...
The file name "X" above is the original file name, which can be anything. It then adds one space after the name, then the page number. Page numbers always start from 1 and up to how many pages. There is no option in adobe PDF to change this.
I need to run a shell command to simply remove/strip out all the "X " part, and just leave the digits, like this
1.pdf
2.pdf
3.pdf
....
100.pdf ...etc..
Not being good in pattern matching, not sure what regular expression I need.
I know I need something like
for i in *.pdf; do mv "$i$" ........; done
And it is the ....... part I do not know how to do.
This only needs to run on Linux/Unix system.
Use sed..
for i in *.pdf; do mv "$i" $(sed 's/.*[[:blank:]]//' <<< "$i"); done
And it would be simple through rename
rename 's/.*\s//' *.pdf
You can remove everything up to (including) the last space in the variable with this:
${i##* }
That's "star space" after the double hash, meaning "anything followed by space". ${i#* } would remove up to the first space.
So run this to check:
for i in *.pdf; do echo mv -i -- "$i" "${i##* }" ; done
and remove the echo if it looks good. The -i suggested by Gordon Davisson will prompt you before overwriting, and -- signifies end of options, which prevents things from blowing up if you ever have filenames starting with -.
If you just want to do bulk renaming of files (or directories) and don't mind using external tools, then here's mine: rnm
The command to do what you want would be:
rnm -rs '/.*\s//' *.pdf
.*\s selects the part before (and with) the last white space and replaces it with empty string.
Note:
It doesn't overwrite any existing files (throws warning if it finds an existing file with the target name).
And this operation is failsafe. You can get back the changes made by last rnm command with rnm -u.
Here's a list of documents for rnm.

export filenames to temp file bash

I have a lot of files in multiple directories that all have the following setup for the filename:
prob123456_01
I want to delete the trailing "_01" off of each file name and export them to a temp file. How exactly would I delete the trailing "_01" as well as export? I am rather new to scripting so any help would be greatly appreciated!
As you've tagged with bash, I'll assume that you can use globstar
shopt -s globstar # enable globstar
for f in **_[0-9][0-9]; do echo "${f%_*}"; done > tmp
With globstar enabled, the pattern **_[0-9][0-9] matches any file ending in _, followed by any 2 digit number, in the current directory and any subdirectories. ${f%_*} removes the end of the file name using bash's built-in string manipulation functionality.
Better yet, as Charles Duffy suggests (thanks), you can use an array instead of a loop:
files=( **_[0-9][0-9] ); printf '%s\n' "${files[#]%_*}"
The array is filled the filenames that match the same pattern as before. ${files[#]%_*} removes the last part from each element of the array and passes them all as arguments to printf, which prints each result on a separate line.
Either of these approaches is likely to be quicker than using find as everything is done in the shell, without executing any separate processes.
Previously I had suggested to use the pattern **_{00..99}, although this is not ideal for a couple of reasons. It is less efficient, as it expands to **_00, **_01, **_02, ..., **_99. Also, any of those 100 patterns that don't match will be included literally in the output unless another option, nullglob is enabled.
It's up to you whether you use [0-9] or [[:digit:]] but the advantage of the latter is that it matches all characters defined to be a digit, which may vary depending on your locale. If this isn't a concern, I would go with the former.
If I understand you correctly, you want a list of the filenames without the trailing _01. The following would do that:
find . -type f -name '*_01' | sed 's/_01$//' > tmp.lst
find . -type f -name '*_01' looks for all the files in the current directory, and its descendent directories, for files with names ending in _01.
| is the so-called pipe, handing the results of the left-hand call to the right-hand call.
sed 's/_01$//' removes the _01 from the end of each filename.
> tmp.lst writes the result into the file tmp.lst
These are all pretty basic parts of working with bash and its likes, so it might be a good idea to look at a tutorial or two and familiarize yourself with those and a few others ;)

Optimal procedure to rename files in a linux directory

This question pertains to renaming files in a directory on a Linux system wherein the affected files appear in this general format:
index.html?p=155
index.html?page_id=10
index.html?author=2&paged=5
index.html?feed=rss2&tag=search-engine
index.html?tag=social-media
Might there be a shell level "rename" command that I can use to replace the question marks (?) with an underscore (_) in each file within a directory?
Thank you in advance for any advice or information!
I would prefer the rename command myself, though sometimes a roll-your-own for loop can be more targeted.
Note: rename takes a sed expression as the argument to change and filenames as the last argument. The proper call to use would be:
rename 's/\?/_/' index*
because the ? indicates 0 or 1 of a preceding character when not \ escaped.
This is also easier to toss into a find command which can operate recursively, etc:
find . -name index.html* -exec rename 's/\?/_/' {} +
for file in index.html\?*; do
new=${file/\?/_} # Substitute underscore for ?
mv "$file" "$new" # Rename the file
done
See the Parameter Expansion section of the bash man page for information on the substitution syntax used.
You can use the rename command.
rename '?' '_' *
The first parameter is the expression you want to replace, the second argument is the string to replace the first parameter with, and the final option is the selection of files to apply it to ( all in the current directory, in this case )
See the man page for more details.http://ss64.com/bash/rename.html