I have a bunch of files in a directory that were produced with rather unfortunate names. I want to change two of the characters in the name.
For example I have:
>ch:sdsn-sdfs.txt
and I want to remove the ">" and change the ":" to a "_".
Resulting in
ch_sdsn-sdfs.txt
I tried to just say mv \\>ch\:* ch_* but that didn't work.
Is there a simple solution to this?
For command line script to rename, this stackoverflow question has good answers.
For Mac, In GUI, Finder comes with bulk rename capabilities. If source list of files has some pattern to find & replace, it comes very handy.
Select all the files that need to be replaced, right click and select rename
On rename, enter find and replace string
Other options in rename, to sequence the file names:
To prefix or suffix text:
First, I should say that the easiest way to do this is to use the
prename or rename commands.
Homebrew package rename, MacPorts package renameutils :
rename s/0000/000/ F0000*
That's a lot more understandable than the equivalent sed command.
But as for understanding the sed command, the sed manpage is helpful. If
you run man sed and search for & (using the / command to search),
you'll find it's a special character in s/foo/bar/ replacements.
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successâ
ful, replace that portion matched with replacement. The
replacement may contain the special character & to refer to that
portion of the pattern space which matched, and the special
escapes \1 through \9 to refer to the corresponding matching
sub-expressions in the regexp.
Therefore, \(.\) matches the first character, which can be referenced by \1.
Then . matches the next character, which is always 0.
Then \(.*\) matches the rest of the filename, which can be referenced by \2.
The replacement string puts it all together using & (the original
filename) and \1\2 which is every part of the filename except the 2nd
character, which was a 0.
This is a pretty cryptic way to do this, IMHO. If for
some reason the rename command was not available and you wanted to use
sed to do the rename (or perhaps you were doing something too complex
for rename?), being more explicit in your regex would make it much
more readable. Perhaps something like:
ls F00001-0708-*|sed 's/F0000\(.*\)/mv & F000\1/' | sh
Being able to see what's actually changing in the
s/search/replacement/ makes it much more readable. Also it won't keep
sucking characters out of your filename if you accidentally run it
twice or something.
Related
I have a 3-step problem: I need to
find all occurrences of the character : in a latex file but only when it is in a \ref{} or in a \label{}, in which there can be other characters. Example: The system's total energy (\ref{eq:E}).
replace those : with _. Example becomes: The system's total energy (\ref{eq_E}).
do this for all such occurrences of : in references or labels, in about 100 files.
I've never done this before. I've worked out that I can use regular expressions to find complex occurrences. I can find either \ref{ or \label{ with (\\ref\{|\\label\{), but I can't put it in a lookbehind because it is not fixed width. My other problem with lookbehind and lookahead is that I can only match everything between my assertions, not specific characters (from what I've understood).
I've also worked out that I can use sed for find and replace. I was planning on using a regular expression as my sed "find". Does that make sense?
And finally, I'm not sure how to go about looping on all my files (which have ordered names). Can I do an if or while loop in a bash script?
I know that my questions are all over the place, as I said, never done this before and there is a mountain of documentation I'm only beginning to tackle. Any help or pointers would be appreciated.
You can use the following command which relies on capturing groups to extract the different parts of a ref or label containing a colon to replace it with the equivalent using an underscore :
sed -E 's/\\(ref|label)\{([^:]*):([^}]*)}/\\\1\{\2_\3}/g'
The expression captures the whole ref or label tag, matching the tag name in the first capturing group, the part that precedes the colon in the second capturing group and the part that follows the colon in the third capturing group. The replacement pattern uses references to these capturing groups and can be read as \<tagName>{<before colon>_<after colon>}.
You can try it here.
Note that it would be prefereable to use a parser that understands the latex format, the regex is likely to fail for some edge cases.
And finally, I'm not sure how to go about looping on all my files (which have ordered names). Can I do an if or while loop in a bash script?
sed accepts a list of files as parameter and will apply its command on all of them. The list of files can be produced by the expansion of a glob, e.g. sed 'sedCommand' /your/directory/*.txt which would work on all file of /your/directory/ whose name end in .txt.
In this case you will likely want to use sed's -i "in place" flag which asks sed to direcly write its result in the target file rather than on its standard output. The flag can be followed by a suffix if you want a backup of the original, for instance sed -i.bak 'command' file.txt will have file.txt contain the result and file.txt.bak the original.
I have a collection of files where the capital letters are replaced by their ASCII-code (example ;065 for A). How can I most effectively recursively rename them from the command line?
Since I don't want to make the mess worse, I unfortunately don't know how test any commands...
For me it would be no problem to modify the command for each letter.
Many Linux distributions ship some variant or another of the Perl rename script, sometimes as prename, sometimes as rename. Any variant will do, but not the Linux rename utility that isn't written in Perl (run it with no argument and see if the help text mentions perl anywhere). This script runs Perl code on file names, typically a regex replacement.
prename -n 's/;(03[2-9]|0[4-9][0-9]|1[01][0-9]|12[0-6])/chr($1)/eg' *
I made a regular expression that matches three-digit numbers that are the character code of a printable ASCII character. You may need to adjust it depending on exactly what can follow a semicolon. The * at the end says to rename all files in the current directory, it's just a normal shell wildcard. It's ok to include files that don't contain anything to rename: prename will just skip them.
The -n option says to show what would be done, but don't actually rename any file. Review the output. If you're happy with it, run the command again without -n to actually rename the files.
I'm dealing with a body of XML files containing unstructured texts with semantic markup for personal names.
For reasons to do with the stylesheet that will eventually show them via a web application, I need to replace:
<persName>Fred</persName>'s
<persName>Wilma</persName>'s
with
<persName>Fred's</persName>
<persName>Wilma's</persName>
I have a single line in a shell script, being run in Gitbash for Windows, below. It runs OK, but has no effect. I suppose I'm missing something obvious, perhaps to do with escaping characters, but any help appreciated.
sed -i "s/<\/persName>\'s/\'s<\/persName>/g" test.xml
You may use
sed -i "s,</persName>'s,'s</persName>,g" test.xml
Details
s - we want to replace
, - a delimiter
</persName>'s - this string to find
, - delimiter
's</persName> - replace with this string
, - delimiter
g - multiple times if more than one is found
The -i option makes the replacements directly in the file.
Note that you do not have to escape ' when defining the sed command inside a double quoted string.
It is a good idea to use a delimiter char other than the common / if there are / chars inside the regex or/and replacement pattern.
The comment on your question suggests an easier solution, but I guess, that there might be names where the suffix 's differs, like names ending with an s. So I chose a solution where you grab what's right and put it in the middle.
As separator for the search and replace command in sed you can choose whatever you want. I've chosen #, so you don't have to escape the backslashes in the text. The escaped parantheses store what's inside in variables \1 and \2.
sed 's#<persName>\(.*\)</persName>\(.*\)#<persName>\1\2</persName>#g' testfile
Result:
<persName>Fred's</persName>
<persName>Wilma's</persName>
If you want to replace it in file, you can use the -i parameter. But be sure to check the result first.
I'm trying to use the rename perl command in Debian to rename files and remove detritus from the end of the filename.
The file names may be like this (varying length/nodes before the series/episode identifier)
A.TV.Show.S01E01.HDTV.XVid[stuff].avi
Other.Prog.S07E09.WEB.H264[things].mp4
And I want to remove everything after the SnnEnn bit and keep the file extension. For example
A.TV.Show.S01E01.avi
Other.Prog.S07E09.mp4
I don't mind having a command per file extension, although a single command that is extension agnostic would be better.
What I have so far is as follows:
rename -nv -- 's/[0-9][.].*?[.]avi$/.avi/' *.avi
I'm using -n just now so it just shows what the rename would do, without doing it.
The problem is it's losing the number at the end of the series and episode identifier - I need it to keep the first character of the matched text then throw the rest away.
What it gives me currently is files named thus:
A.TV.Show.S01E0.avi
Other.Prog.S07E0.mp4
Any idea how to do this? Is there a better pattern than I'm using?
This should work. It's capturing the part that you want to keep in parentheses, and then referring to it in the replacement as $1.
rename -nv -- 's/(^.*?S\d{2}E\d{2})\..*?\.(*)$/$1.$2/' *
You need parentheses to capture parts of the strings:
s/([0-9])[.].*?[.]([^.]+)$/$1.$2/
or, you can use a look-behind instead of the first capture:
s/(?<=[0-9])[.].*?[.]([^.]+)$/.$1/
Please be patient, this post will be somewhat long...
I have a bunch of files, some of them with a simple and clean name (e.g. 1E01.txt) and some with a lot of extras:
Sample2_Name_E01_-co_032.txt
Sample2_Name_E02_-co_035.txt
...
Sample12_Name_E01_-co_061.txt
and so on. What is important here is the number after "Sample" and the letter+number after "Name" - the rest is disposable. If i get rid of the non-important parts, the filename reduces to the same pattern as the "clean" filenames (2E01.txt, 2E02.txt, ..., 12E01.txt). I've managed to rename the files with the following expression (came up with this one myself, don't know if is very elegant but works fine):
rename -v 's/Sample([0-9]+)_Name_([A-Z][0-9]+).*/$1$2\.txt/' *.txt
Now, the second part, is adding a leading zero for filenames with just one digit, such as 1E01.txt turns into 01E01.txt. I've managed to to this with (found and modified this on another StackExchange post):
rename -v 'unless (/^[0-9]{2}.*\.txt/) {s/^([0-9]{1}.*\.txt)$/0$1/;s/0*([0-9]{2}\..*)/$1/}' *.txt
So I finally got to my question: is there a way to merge both expressions in just one rename command? I know I could do a bash script to automate the process, but what I want is to find a one-pass renaming solution.
thanks
You can try this command to rename 1-file.txt to 0001-file.txt
# fill zeros
$ rename 's/\d+/sprintf("%04d",$&)/e' *.txt
You can change the command a little to meet your need.
Well if that is your "parsing" regex, then you are limiting the files that the script can act on those matching that pattern. Thus, the sprintf using the same literal strings is not a more specialized case, and you could just do this:
s{Sample(\d+)_Name_(\p{IsUpper})(\d+)}
{sprintf "Sample%02d_Name_%s%03d", $1, $2, $3}e
;
Here, you are using the same known features again and simply formatting the accompanying numbers.
The /e switch is for 'eval' and it evaluates the replacement as Perl for each match.
I renamed some of your expressions to more standard character class symbols: [A-Z] becomes the property class \p{IsUpper}, [0-9] becomes the digit code \d (also possible \p{IsDigit} ).