I would like to copy some files in a directory, renaming the files but conserving extension. Is this possible with a simple cp, using regex ?
For example :
cp ^myfile\.(.*) mydir/newname.$1
So I could copy the file conserving the extension but renaming it. Is there a way to get matched elements in the cp regex to use it in the command ?
If not, I'll do a perl script I think, or if you have another way...
Thanks
Suppose you have myfile.a, myfile.b, myfile.c:
for i in myfile.*; do echo mv "$i" "${i/myfile./newname.}"; done
This creates (upon removal of echo) newname.a, newname.b, newname.c.
The shell doesn't understand general regexes; you'll have to outsource to auxiliary programs for that. The classical scripty way to solve your task would be something like
for a in myfile.* ; do
b=`echo $a | sed 's!^myfile!mydir/newname!'`
cp $a $b
done
Or have a perl script generate a list of commands that you then source into the shell.
I really like the regex syntax of the rename perl script (by Robin Barker and Larry Wall), e.g.:
rename "s/OldFile/NewFile/" OldFile*
OldFile.c and OldFile.h are renamed to NewFile.c and NewFile.h, respectively
I simply wanted the exact same thing with a copy command:
copy "s/OldFile/NewFile/" OldFile*
So I duplicated that script and changed the rename statement to copy via File::Copy. Et voila! A copy command with perl-regex syntax:
https://gist.github.com/jcward/0ead33bd79f2061c68728cc82582241f
Related
I'm fairly new to the whole coding game, and am very grateful for every answer!
I am working on a directory with many .txt files in them and have a file with looong list of regex like "perl -p -i -e 's/\n\n/\n/g' *.xml" they all work if I copy them to terminal. But is there a possibility to run them straight from the file?
I tried ./unicode.sh but that resulted in:
No such file or directory.
Any ideas?
Thank you so much!
Here's a (mostly) equivalent Perl script to the oneliner perl -p -i -e 's/\n\n/\n/g' *.xml (one main difference being that this has strict and warnings enabled, which is strongly recommended), which you could expand upon by putting more code to modify the current line in the body of the while loop.
#!/usr/bin/env perl
use warnings;
use strict;
if (!#ARGV) { # if no files on command line
#ARGV = glob('*.xml'); # get a default list of files
}
local $^I = ''; # enable inplace editing (like perl -i)
while (<>) { # read each line of each file into $_
s/\n\n/\n/g; # modify $_ with a regex
# more regexes here...
print; # write the line $_ back out
}
You can save this script in a file such as process.pl, and then run it with perl process.pl, or do chmod u+x process.pl and then run it via ./process.pl.
On the other hand, you really shouldn't modify XML files with regular expressions, there are lots of Perl modules to do XML processing - I wrote about that some more here. Also, in the example you showed, s/\n\n/\n/g actually won't have any effect, since when reading files line-by-line, no string will contain two \n's (you can change how Perl reads files, but I don't see any mention of that in the question).
Edit: You've named the script in your example unicode.sh - if you're processing Unicode files, then Perl has very powerful features to help with that, although the code won't necessarily end up as nice and short as I've showed above. You'll have to tell us some more about what you're doing, and show some example input and output, to get suggestions about that. See also e.g. perlunitut.
It's likely if you got no such file or directory, your problem was you forgot to make unicode.sh executable, as in chmod +x unicode.sh, assuming that's a script that you wrote.
Of course the normal way to run multiple perl commands is this thing that looks like runme.pl which you write, i.e., a perl script.
That said, yes, everything will work from the terminal, you just need to be careful about escaping that bash performs.
This should be a basic question for a lot of people, but I am a biologist with no programming background, so please excuse my question.
What I am trying to do is rename about 100,000 gzipped data files that have existing name of a code (example: XG453834.fasta.gz). I'd like to name them to something easily readable and parseable by me (example: Xanthomonas_galactus_str_453.fasta.gz).
I've tried to use sed, rename, and mmv, to no avail. If I use any of those commands on a one-off script then they work fine, it's just when I try to incorporate variables into a shell script do I run into problems. I'm not getting any errors, just no names are changed, so I suspect it's an I/O error.
Here's what my files look like:
#! /bin/bash
# change a bunch of file names
file=names.txt
while IFS=' ' read -r r1 r2;
do
mmv ''$r1'.fasta.gz' ''$r2'.fasta.gz'
# or I tried many versions of: sed -i 's/"$r1"/"$r2"/' *.gz
# and I tried many versions of: rename -i 's/$r1/$r2/' *.gz
done < "$file"
...and here's the first lines of my txt file with single space delimiter:
cat names.txt
#find #replace
code1 name1
code2 name2
code3 name3
I know I can do this with python or perl, but since I'm stuck here working on this particular script I want to find a simple solution to fixing this bash script and figure out what I am doing wrong. Thanks so much for any help possible.
Also, I tried to cat the names file (see comment from Ashoka Lella below) and then use awk to move/rename. Some of the files have variable names (but will always start with the code), so I am looking for a find & replace option to just replace the "code" with the "name" and preserve the file name structure.
I suspect I am not escaping the variable within the single tick of the perl expression, but I have poured over a lot of manuals and I can't find the way to do this.
If you're absolutely sure than the filenames doesn't contain spaces of tabs, you can try the next
xargs -n2 < names.txt echo mv
This is for DRY run (will only print what will do) - if you satisfied with the result, remove the echo ...
If you want check the existence ot the target, use
xargs -n2 < names.txt echo mv -i
if you want NEVER allow overwriting of the target use
xargs -n2 < names.txt echo mv -n
again, remove the echo if youre satisfied.
I don't think that you need to be using mmv, a simple mv will do. Also, there's no need to specify the IFS, the default will work for you:
while read -r src dest; do mv "$src" "$dest"; done < names.txt
I have double quoted the variable names as it is generally considered good practice but in this case, a space in either of the filenames will result in read not working as you expect.
You can put an echo before the mv inside the loop to ensure that the correct command will be executed.
Note that in your file names.txt, the .fasta.gz suffix is already included, so you shouldn't be adding it inside the loop aswell. Perhaps that was your problem?
This should rename all files in column1 to column2 of names.txt. Provided they are in the same folder as names.txt
cat names.txt| awk '{print "mv "$1" "$2}'|sh
I have been trying to learn how to adequately perform a single command multiple times using the command line. Although I have learned how to do a single command with no input and output files, it gets more complicated when it needs these.
The cp command requires this so lets use this as an example. I look for all images with .png extension and copy them. The way I have come up with after using google is:
find -regex ".*\.\(png\)" -exec cp {} {}3 \;
The only problem with that is that I have to rename the file with any figure after the name, so it gets renamed to something like file.png3 instead of file.png. I can't figure out how to do if differently as I can't put the new figure before the name as it doesn't seem to work.
Is there a better way to do this or am I going about it completely the wrong way?
I'm not sure how you might do that in a single find command, but you could split it out. First, find the files with find. Then use sed to remove the .png extension. Finally, use xargs to run the copy function on each file. Like this:
find -regex ".*\.\(png\)" | sed -r 's/.png//g' | xargs -I {} cp {}.png {}_copy.png
If you didn't know, the pipe "|" will send the output of one program into the next.
Alternatively, you could just modify the beginning of the filename (so 3img.png instead of img.png3) or copy to a new folder.
I am trying to write a bash shell script to rename a bunch of photos to my own numbering system. All images filenames are like "IMG_0000.JPG" and I can get the script to match and rename(overwrite) all the photos with the following Perl-regex code:
#!/bin/bash
rename -f 's/\w{4}\d{4}.JPG/replacement.jpg/' *.JPG;
But when I try to use a variable as the name of the replacement, as I keep seeing on other posts here and elsewhere on the internet, nothing happens:
#!/bin/bash
$replacement = "000.jpg";
rename -f 's/\w{4}\d{4}.JPG/$replacement/' *.JPG;
How can I get such a variable to work correctly in my bash script? (NOTE: I am not looking to simply strip the "IMG_" from the filename)
Take the replacement out of single quotes:
#!/bin/bash
$replacement="000.jpg"
rename -f 's/\w{4}\d{4}.JPG/'$replacement'/' *.JPG
Bash does not inspect single quoted strings for interpolation.
Using double quotes and correct variable assignment:
#!/bin/bash
replacement="000.jpg"
rename -f "s/\w{4}\d{4}\.JPG/$replacement/" *.JPG
Note that this can cause trouble, e.g. when renaming two files with names like IMG_0001.JPG and FOO_9352.JPG: The first file will be renamed to 000.jpg, then the second file will also be renamed to 000.jpg, overwriting the first.
I have a requirement to search for a pattern which is something like :
timeouts = {default = 3.0; };
and replace it with
timeouts = {default = 3000.0;.... };
i.e multiply the timeout by factor of 1000.
Is there any way to do this for all files in a directory
EDIT :
Please note that some of the files are symlinks in the directory.Is there any way to get this done for symlinks also ?
Please note that timeouts exists as a substring also in the files so i want to make sure that only this line gets replaced. Any solution is acceptable using sed awk perl .
Give this a try:
for f in *
do
sed -i 's/\(timeouts = {default = [0-9]\+\)\(\.[0-9]\+;\)\( };\)/\1000\2....\3/' "$f"
done
It will make the replacements in place for each file in the current directory. Some versions of sed require a backup extension after the -i option. You can supply one like this:
sed -i .bak ...
Some versions don't support in-place editing. You can do this:
sed '...' "$f" > tmpfile && mv tmpfile "$f"
Note that this is obviously not actually multiplying by 1000, so if the number is 3.1 it would become "3000.1" instead of 3100.0.
you can do this
perl -pi -e 's/(timeouts\s*=\s*\{default\s*=\s*)([0-9.-]+)/print $1; $2*1000/e' *
One suggestion for whichever solution above you decide to use - it may be worth it to think through how you could refactor to avoid having to modify all of these files for a change like this again.
Do all of these scripts have similar functionality?
Can you create a module that they would all use for shared subroutines?
In the module, could you have a single line that would allow you to have a multiplier?
For me, anytime I need to make similar changes in more than one file, it's the perfect time to be lazy to save myself time and maintenance issues later.
$ perl -pi.bak -e 's/\w+\s*=\s*{\s*\w+\s*=\s*\K(-?[0-9.]+)/sprintf "%0.1f", 1000 * $1/eg' *
Notes:
The regex matches just the number (see \K in perlre)
The /e means the replacement is evaluated
I include a sprintf in the replacement just in case you need finer control over the formatting
Perl's -i can operate on a bunch of files
EDIT
It has been pointed out that some of the files are shambolic links. Given that this process is not idempotent (running it twice on the same file is bad), you had better generate a unique list of files in case one of the links points to a file that appears elsewhere in the list. Here is an example with find, though the code for a pre-existing list should be obvious.
$ find -L . -type f -exec realpath {} \; | sort -u | xargs -d '\n' perl ...
(Assumes none of your filenames contain a newline!)