Replace and add leading zeros when renaming files - regex

Please be patient, this post will be somewhat long...
I have a bunch of files, some of them with a simple and clean name (e.g. 1E01.txt) and some with a lot of extras:
Sample2_Name_E01_-co_032.txt
Sample2_Name_E02_-co_035.txt
...
Sample12_Name_E01_-co_061.txt
and so on. What is important here is the number after "Sample" and the letter+number after "Name" - the rest is disposable. If i get rid of the non-important parts, the filename reduces to the same pattern as the "clean" filenames (2E01.txt, 2E02.txt, ..., 12E01.txt). I've managed to rename the files with the following expression (came up with this one myself, don't know if is very elegant but works fine):
rename -v 's/Sample([0-9]+)_Name_([A-Z][0-9]+).*/$1$2\.txt/' *.txt
Now, the second part, is adding a leading zero for filenames with just one digit, such as 1E01.txt turns into 01E01.txt. I've managed to to this with (found and modified this on another StackExchange post):
rename -v 'unless (/^[0-9]{2}.*\.txt/) {s/^([0-9]{1}.*\.txt)$/0$1/;s/0*([0-9]{2}\..*)/$1/}' *.txt
So I finally got to my question: is there a way to merge both expressions in just one rename command? I know I could do a bash script to automate the process, but what I want is to find a one-pass renaming solution.
thanks

You can try this command to rename 1-file.txt to 0001-file.txt
# fill zeros
$ rename 's/\d+/sprintf("%04d",$&)/e' *.txt
You can change the command a little to meet your need.

Well if that is your "parsing" regex, then you are limiting the files that the script can act on those matching that pattern. Thus, the sprintf using the same literal strings is not a more specialized case, and you could just do this:
s{Sample(\d+)_Name_(\p{IsUpper})(\d+)}
{sprintf "Sample%02d_Name_%s%03d", $1, $2, $3}e
;
Here, you are using the same known features again and simply formatting the accompanying numbers.
The /e switch is for 'eval' and it evaluates the replacement as Perl for each match.
I renamed some of your expressions to more standard character class symbols: [A-Z] becomes the property class \p{IsUpper}, [0-9] becomes the digit code \d (also possible \p{IsDigit} ).

Related

Is there a bash script for finding a specific character between two given expressions?

I have a 3-step problem: I need to
find all occurrences of the character : in a latex file but only when it is in a \ref{} or in a \label{}, in which there can be other characters. Example: The system's total energy (\ref{eq:E}).
replace those : with _. Example becomes: The system's total energy (\ref{eq_E}).
do this for all such occurrences of : in references or labels, in about 100 files.
I've never done this before. I've worked out that I can use regular expressions to find complex occurrences. I can find either \ref{ or \label{ with (\\ref\{|\\label\{), but I can't put it in a lookbehind because it is not fixed width. My other problem with lookbehind and lookahead is that I can only match everything between my assertions, not specific characters (from what I've understood).
I've also worked out that I can use sed for find and replace. I was planning on using a regular expression as my sed "find". Does that make sense?
And finally, I'm not sure how to go about looping on all my files (which have ordered names). Can I do an if or while loop in a bash script?
I know that my questions are all over the place, as I said, never done this before and there is a mountain of documentation I'm only beginning to tackle. Any help or pointers would be appreciated.
You can use the following command which relies on capturing groups to extract the different parts of a ref or label containing a colon to replace it with the equivalent using an underscore :
sed -E 's/\\(ref|label)\{([^:]*):([^}]*)}/\\\1\{\2_\3}/g'
The expression captures the whole ref or label tag, matching the tag name in the first capturing group, the part that precedes the colon in the second capturing group and the part that follows the colon in the third capturing group. The replacement pattern uses references to these capturing groups and can be read as \<tagName>{<before colon>_<after colon>}.
You can try it here.
Note that it would be prefereable to use a parser that understands the latex format, the regex is likely to fail for some edge cases.
And finally, I'm not sure how to go about looping on all my files (which have ordered names). Can I do an if or while loop in a bash script?
sed accepts a list of files as parameter and will apply its command on all of them. The list of files can be produced by the expansion of a glob, e.g. sed 'sedCommand' /your/directory/*.txt which would work on all file of /your/directory/ whose name end in .txt.
In this case you will likely want to use sed's -i "in place" flag which asks sed to direcly write its result in the target file rather than on its standard output. The flag can be followed by a suffix if you want a backup of the original, for instance sed -i.bak 'command' file.txt will have file.txt contain the result and file.txt.bak the original.

Linux: rename files containing ASCII-Code for capital letters

I have a collection of files where the capital letters are replaced by their ASCII-code (example ;065 for A). How can I most effectively recursively rename them from the command line?
Since I don't want to make the mess worse, I unfortunately don't know how test any commands...
For me it would be no problem to modify the command for each letter.
Many Linux distributions ship some variant or another of the Perl rename script, sometimes as prename, sometimes as rename. Any variant will do, but not the Linux rename utility that isn't written in Perl (run it with no argument and see if the help text mentions perl anywhere). This script runs Perl code on file names, typically a regex replacement.
prename -n 's/;(03[2-9]|0[4-9][0-9]|1[01][0-9]|12[0-6])/chr($1)/eg' *
I made a regular expression that matches three-digit numbers that are the character code of a printable ASCII character. You may need to adjust it depending on exactly what can follow a semicolon. The * at the end says to rename all files in the current directory, it's just a normal shell wildcard. It's ok to include files that don't contain anything to rename: prename will just skip them.
The -n option says to show what would be done, but don't actually rename any file. Review the output. If you're happy with it, run the command again without -n to actually rename the files.

How to rename a file using regex capture group in Linux?

I want to rename a_1.0.tgz to b_1.0.tgz, since 1.0 may be changed to any version number, how can I achieve that?
For example, I can use mv a*.tgz b.tgz if I don't need to keep the version number.
zsh comes with the utility zmv, which is intended for exactly that. While zmv does not support regex, it does provide capture groups for filename generation patterns (aka globbing).
First, you might need to enable zmv. This can be done by adding the following to your ~/.zshrc:
autoload -Uz zmv
You can then use it like this:
zmv 'a_(*)' 'b_$1'
This will rename any file matching a_* so, that a_ is replaced by b_. If you want to be less general, you can of course adjust the pattern:
to rename only .tgz files:
zmv 'a_(*.tgz)' 'b_$1'
to rename only .tgz files while changing the extension to .tar.gz
zmv 'a_(*).tgz' 'b_$1.tar.gz'
to only rename a_1.0.tgz:
zmv 'a_(1.0.tgz)' 'b_$1'
To be on the save side, you can run zmv with the option -n first. This will only print, what would happen, but not actually change anything. For more information have a look at the man zshcontrib.
I'm not too familiar with zsh so I don't know if it supports regular expressions but I don't think you really need them here.
You can match the file using a glob and use a substitution:
for file in a_[0-9].[0-9].tgz; do
echo "$file" "${file/a/b}"
done
In the glob pattern, [0-9] matches any number between 0 and 9. ${file/a/b} substitutes the first occurrence of a with b.
Change the echo to mv if you're happy with the result.
Assuming you would like to replace the first character in all files matching a*.tgz with the letter b:
for f in a*.tgz; do
echo mv "$f" "b${f:1}"
done
Remove the echo when you are certain that this does what you want it to do.
The ${f:1} uses the ${name:offset} parameter expansion. From the zshexpn manual (on OS X):
If offset is non-negative, then if the variable name is a
scalar substitute the contents starting offset characters
from the first character of the string, [...]

Keep the first character of a sed regex match

I'm trying to use the rename perl command in Debian to rename files and remove detritus from the end of the filename.
The file names may be like this (varying length/nodes before the series/episode identifier)
A.TV.Show.S01E01.HDTV.XVid[stuff].avi
Other.Prog.S07E09.WEB.H264[things].mp4
And I want to remove everything after the SnnEnn bit and keep the file extension. For example
A.TV.Show.S01E01.avi
Other.Prog.S07E09.mp4
I don't mind having a command per file extension, although a single command that is extension agnostic would be better.
What I have so far is as follows:
rename -nv -- 's/[0-9][.].*?[.]avi$/.avi/' *.avi
I'm using -n just now so it just shows what the rename would do, without doing it.
The problem is it's losing the number at the end of the series and episode identifier - I need it to keep the first character of the matched text then throw the rest away.
What it gives me currently is files named thus:
A.TV.Show.S01E0.avi
Other.Prog.S07E0.mp4
Any idea how to do this? Is there a better pattern than I'm using?
This should work. It's capturing the part that you want to keep in parentheses, and then referring to it in the replacement as $1.
rename -nv -- 's/(^.*?S\d{2}E\d{2})\..*?\.(*)$/$1.$2/' *
You need parentheses to capture parts of the strings:
s/([0-9])[.].*?[.]([^.]+)$/$1.$2/
or, you can use a look-behind instead of the first capture:
s/(?<=[0-9])[.].*?[.]([^.]+)$/.$1/

Rename Files Mac Command Line

I have a bunch of files in a directory that were produced with rather unfortunate names. I want to change two of the characters in the name.
For example I have:
>ch:sdsn-sdfs.txt
and I want to remove the ">" and change the ":" to a "_".
Resulting in
ch_sdsn-sdfs.txt
I tried to just say mv \\>ch\:* ch_* but that didn't work.
Is there a simple solution to this?
For command line script to rename, this stackoverflow question has good answers.
For Mac, In GUI, Finder comes with bulk rename capabilities. If source list of files has some pattern to find & replace, it comes very handy.
Select all the files that need to be replaced, right click and select rename
On rename, enter find and replace string
Other options in rename, to sequence the file names:
To prefix or suffix text:
First, I should say that the easiest way to do this is to use the
prename or rename commands.
Homebrew package rename, MacPorts package renameutils :
rename s/0000/000/ F0000*
That's a lot more understandable than the equivalent sed command.
But as for understanding the sed command, the sed manpage is helpful. If
you run man sed and search for & (using the / command to search),
you'll find it's a special character in s/foo/bar/ replacements.
s/regexp/replacement/
Attempt to match regexp against the pattern space. If success‐
ful, replace that portion matched with replacement. The
replacement may contain the special character & to refer to that
portion of the pattern space which matched, and the special
escapes \1 through \9 to refer to the corresponding matching
sub-expressions in the regexp.
Therefore, \(.\) matches the first character, which can be referenced by \1.
Then . matches the next character, which is always 0.
Then \(.*\) matches the rest of the filename, which can be referenced by \2.
The replacement string puts it all together using & (the original
filename) and \1\2 which is every part of the filename except the 2nd
character, which was a 0.
This is a pretty cryptic way to do this, IMHO. If for
some reason the rename command was not available and you wanted to use
sed to do the rename (or perhaps you were doing something too complex
for rename?), being more explicit in your regex would make it much
more readable. Perhaps something like:
ls F00001-0708-*|sed 's/F0000\(.*\)/mv & F000\1/' | sh
Being able to see what's actually changing in the
s/search/replacement/ makes it much more readable. Also it won't keep
sucking characters out of your filename if you accidentally run it
twice or something.