single repeating command with input and output files - regex

I have been trying to learn how to adequately perform a single command multiple times using the command line. Although I have learned how to do a single command with no input and output files, it gets more complicated when it needs these.
The cp command requires this so lets use this as an example. I look for all images with .png extension and copy them. The way I have come up with after using google is:
find -regex ".*\.\(png\)" -exec cp {} {}3 \;
The only problem with that is that I have to rename the file with any figure after the name, so it gets renamed to something like file.png3 instead of file.png. I can't figure out how to do if differently as I can't put the new figure before the name as it doesn't seem to work.
Is there a better way to do this or am I going about it completely the wrong way?

I'm not sure how you might do that in a single find command, but you could split it out. First, find the files with find. Then use sed to remove the .png extension. Finally, use xargs to run the copy function on each file. Like this:
find -regex ".*\.\(png\)" | sed -r 's/.png//g' | xargs -I {} cp {}.png {}_copy.png
If you didn't know, the pipe "|" will send the output of one program into the next.
Alternatively, you could just modify the beginning of the filename (so 3img.png instead of img.png3) or copy to a new folder.

Related

locate command with regular expression to locate multiples file at a time

There are files like compile.x86.log compile.x86.log-1 compile.x86.log-2 compile.x86.log-3 and error.log error.log_1 error.log_2 error.log_3 want to use locate command to locate only compile.x86.log and error.log among them.
So far I tried
echo $(/usr/bin/locate -ir '^/\([^.][^/]\+/\)\+compile.x86\.log$')
echo $(/usr/bin/locate -ir '^/\([^.][^/]\+/\)\+error\.log$')
With above individual approach it taking search/execution time as 0m18.068s.
How to combine above two?
Also please provide if some other better solution available only with locate command preferably with locate -b option to search exact names as (compile.x86.log and error.log) in less time.
I have tried echo $(/usr/bin/locate -i -b "compile.x86.log")
It's taking command execution time 0m1.887s only but returning compile.x86.log-1 compile.x86.log-2 compile.x86.log-3 in result instead of returning only compile.x86.log which I don't want.
Is there any way to grep the locate result to return only (compile.x86.log and error.log) in this approach.
Because the locate database outputs entries as absolute path names, and the pattern match test applies to the whole path name (the globbing character * does not treat / specially),
locate -i '*/compile.x86.log' '*/error.log'
does what you want.
By the way, the echo $(…) around a command seems waste.
Following your logic, the most simple answer is:
echo $(/usr/bin/locate -ir '^/\([^.][^/]\+/\)\+(compile\.x86|error)\.log$')
where (compile\.x86|error) is the combination that means "this or this" pattern.
Otherwise, using find command would be better:
find -type f -name "compile.x86.log" -o -name "error.log"

Mass rename in shell script

I have a bunch of files which are of this format:
blabla.log.YYYY.MM.DD
Where YYYY.MM.DD is something like (2016.01.18)
I have quite a few folders with about 1000 files in each, so I wanted to have a simple script to rename them. I want to rename them to
blabla.log
So basically, I'm just stripping the date at the end. Here is what I have:
for f in [a-zA-Z]*.log.[0-9][0-9][0-9][0-9].[0-9][0-9].[0-9][0-9]; do
mv -v $f ${f#[0-9][0-9][0-9][0-9].[0-9][0-9].[0-9][0-9]};
done
This script outputs this:
mv: `blabla.log.2016.01.18' and `blabla.log.2016.01.18' are the same file
For more information:
I'm on windows, but I run this script in gitbash
For some reason, my gitbash doesn't recognize the "rename" command
Some regex patterns (like [0-9]{4} don't seem to work)
I'm really at a lost. Thanks.
EDIT: I need to rename every single file that has a date at the end and that is of the from: *.log.2016.01.18. They all need to keep their original names. All that should change is the removal of the date.
You have to use % instead of #: you want to remove from the end, not the start of your string.
Also, you're missing a . in what has to be removed, you don't want to end up with blabla.log..
Quoting the variable names prevents surprises when file names contain special characters.
Together:
mv -v "$f" "${f%.[0-9][0-9][0-9][0-9].[0-9][0-9].[0-9][0-9]}"

batch renaming of files with perl expressions

This should be a basic question for a lot of people, but I am a biologist with no programming background, so please excuse my question.
What I am trying to do is rename about 100,000 gzipped data files that have existing name of a code (example: XG453834.fasta.gz). I'd like to name them to something easily readable and parseable by me (example: Xanthomonas_galactus_str_453.fasta.gz).
I've tried to use sed, rename, and mmv, to no avail. If I use any of those commands on a one-off script then they work fine, it's just when I try to incorporate variables into a shell script do I run into problems. I'm not getting any errors, just no names are changed, so I suspect it's an I/O error.
Here's what my files look like:
#! /bin/bash
# change a bunch of file names
file=names.txt
while IFS=' ' read -r r1 r2;
do
mmv ''$r1'.fasta.gz' ''$r2'.fasta.gz'
# or I tried many versions of: sed -i 's/"$r1"/"$r2"/' *.gz
# and I tried many versions of: rename -i 's/$r1/$r2/' *.gz
done < "$file"
...and here's the first lines of my txt file with single space delimiter:
cat names.txt
#find #replace
code1 name1
code2 name2
code3 name3
I know I can do this with python or perl, but since I'm stuck here working on this particular script I want to find a simple solution to fixing this bash script and figure out what I am doing wrong. Thanks so much for any help possible.
Also, I tried to cat the names file (see comment from Ashoka Lella below) and then use awk to move/rename. Some of the files have variable names (but will always start with the code), so I am looking for a find & replace option to just replace the "code" with the "name" and preserve the file name structure.
I suspect I am not escaping the variable within the single tick of the perl expression, but I have poured over a lot of manuals and I can't find the way to do this.
If you're absolutely sure than the filenames doesn't contain spaces of tabs, you can try the next
xargs -n2 < names.txt echo mv
This is for DRY run (will only print what will do) - if you satisfied with the result, remove the echo ...
If you want check the existence ot the target, use
xargs -n2 < names.txt echo mv -i
if you want NEVER allow overwriting of the target use
xargs -n2 < names.txt echo mv -n
again, remove the echo if youre satisfied.
I don't think that you need to be using mmv, a simple mv will do. Also, there's no need to specify the IFS, the default will work for you:
while read -r src dest; do mv "$src" "$dest"; done < names.txt
I have double quoted the variable names as it is generally considered good practice but in this case, a space in either of the filenames will result in read not working as you expect.
You can put an echo before the mv inside the loop to ensure that the correct command will be executed.
Note that in your file names.txt, the .fasta.gz suffix is already included, so you shouldn't be adding it inside the loop aswell. Perhaps that was your problem?
This should rename all files in column1 to column2 of names.txt. Provided they are in the same folder as names.txt
cat names.txt| awk '{print "mv "$1" "$2}'|sh

using find and rename for their intended use

Now before you face palm and click on duplicate entry or the like, read on, this question is both Theory and practical.
From the title it is pretty obvious what I am trying to do, find some files, then rename them. Well the problem, there is so many way to do this, that I finally decided to pick one, and try to figure it out, theoretically.
Let me set the stage:
Lets say I have 100 files all named like this Image_200x200_nnn_AlphaChars.jpg, where the nnn is a incremental number and AlphaChars ie:
Image_200x200_001_BlueHat.jpg
Image_200x200_002_RedHat.jpg
...
Image_200x200_100_MyCat.jpg
Enter the stage find. Now with a simple one liner I can find all the image files in this directory.(Not sure how to do this case insensitive)
find . -type f -name "*.jpg"
Enter the stage rename. On it's own, rename expect you to do the following:
rename <search> <replace> <haystack>
When I try to combine the two with -print0 and xargs and some regular expressions I get stuck, and I am almost sure it's because rename is looking for the haystack or the search part... (Please do explain if you understand what happens after the pipe)
find . -type f -name "*.jpg" -print0 | xargs -0 rename "s/Image_200x200_(\d{3})/img/"
So the goal is to get the find to give rename the original image name, and replace everything before the last underscore with img
Yes I know that duplicates will give a problem, and yes I know that spaces in the name will also make my life hell, and don't even start with sub directories and the like. To keep it simple, we are talking about a single directory, and all filename are unique and without special characters.
I need to understand the fundamental basics, before getting to the hardcore stuff. Anybody out there feel like helping?
Another approach is to avoid using rename -- bash is capable enough:
find ... -print0 | while read -r -d '' filename; do
mv "$filename" "img_${filename##*_}"
done
the ##*_ part remove all leading characters up to and including the last underscore from the value.
If you don't need -print0 (i.e. you are sure your filenames don't contain newlines), you can just do:
find . -type f -name "*.jpg" | xargs rename 's/Image_200x200_(\d{3})/img/'
Which works for me:
~/tmp$ touch Image_200x200_001_BlueHat.jpg
~/tmp$ touch Image_200x200_002_RedHat.jpg
~/tmp$ touch Image_200x200_100_MyCat.jpg
~/tmp$ find . -type f -name "*.jpg" | xargs rename 's/Image_200x200_(\d{3})/img/'
~/tmp$ ls
img_BlueHat.jpg img_MyCat.jpg img_RedHat.jpg
What's happening after the pipe is that xargs is parsing the output of find and passing that in reasonable chunks to a rename command, which is executing a regex on the filename and renaming the file to the result.
update: I didn't try your version with the null-terminators at first, but it also works for me. Perhaps you tested with a different regex?
What's happening after the pipe:
find ... -print0 | xargs -0 rename "s/Image_200x200_(\d{3})/img/"
xargs is reading the filenames produced by the find command, and executing the rename command repeatedly, appending a few filenames at a time. The net effect will be something like:
rename '...' file001 file002 file003 file004 file005 file006 file007 file008 file009 file010
rename '...' file011 file012 file013 file014 file015 file016 file017 file018 file019 file010
rename '...' file021 file022 file023 file024 file025 file026 file027 file028 file029 file010
...
rename '...' file091 file092 file093 file094 file095 file096 file097 file098 file099 file100
The find -print0 | xargs -0 is a handy combination for more safely handling files that may contain whitespace.

find and replace within file

I have a requirement to search for a pattern which is something like :
timeouts = {default = 3.0; };
and replace it with
timeouts = {default = 3000.0;.... };
i.e multiply the timeout by factor of 1000.
Is there any way to do this for all files in a directory
EDIT :
Please note that some of the files are symlinks in the directory.Is there any way to get this done for symlinks also ?
Please note that timeouts exists as a substring also in the files so i want to make sure that only this line gets replaced. Any solution is acceptable using sed awk perl .
Give this a try:
for f in *
do
sed -i 's/\(timeouts = {default = [0-9]\+\)\(\.[0-9]\+;\)\( };\)/\1000\2....\3/' "$f"
done
It will make the replacements in place for each file in the current directory. Some versions of sed require a backup extension after the -i option. You can supply one like this:
sed -i .bak ...
Some versions don't support in-place editing. You can do this:
sed '...' "$f" > tmpfile && mv tmpfile "$f"
Note that this is obviously not actually multiplying by 1000, so if the number is 3.1 it would become "3000.1" instead of 3100.0.
you can do this
perl -pi -e 's/(timeouts\s*=\s*\{default\s*=\s*)([0-9.-]+)/print $1; $2*1000/e' *
One suggestion for whichever solution above you decide to use - it may be worth it to think through how you could refactor to avoid having to modify all of these files for a change like this again.
Do all of these scripts have similar functionality?
Can you create a module that they would all use for shared subroutines?
In the module, could you have a single line that would allow you to have a multiplier?
For me, anytime I need to make similar changes in more than one file, it's the perfect time to be lazy to save myself time and maintenance issues later.
$ perl -pi.bak -e 's/\w+\s*=\s*{\s*\w+\s*=\s*\K(-?[0-9.]+)/sprintf "%0.1f", 1000 * $1/eg' *
Notes:
The regex matches just the number (see \K in perlre)
The /e means the replacement is evaluated
I include a sprintf in the replacement just in case you need finer control over the formatting
Perl's -i can operate on a bunch of files
EDIT
It has been pointed out that some of the files are shambolic links. Given that this process is not idempotent (running it twice on the same file is bad), you had better generate a unique list of files in case one of the links points to a file that appears elsewhere in the list. Here is an example with find, though the code for a pre-existing list should be obvious.
$ find -L . -type f -exec realpath {} \; | sort -u | xargs -d '\n' perl ...
(Assumes none of your filenames contain a newline!)