renaming mp3 using regex - regex

I wanted to organize my mp3 files and rename them using the pattern: artist - song.
I need a regular expression that selects all the words before the first dash, and then the last dash and all characters proceeding after.
In the example, [09] System Of A Down -Toxicity - 03 - Chop Suey.mp3:
all the word items before the first dash: System of a down
the last dash: -
everything else after the last dash: Chop Suey.mp3
How do I do this?

There is a linux command rename which is exactly what you want.
For example, to rename all files matching "*.bak" to strip the extension, you might say
rename 's/\.bak$//' *.bak
To translate uppercase names to lower, you'd use
rename 'y/A-Z/a-z/' *
To rename [09] System Of A Down -Toxicity - 03 - Chop Suey.mp3 to Toxicity - Chop Suey.mp3
# you should rewrite regex to meet your requirement
rename 's/.*-(.*)-.*-(.*)/$1 - $2/' *.mp3
These is a Windows music player foobar2000 which can do the job very well.
Properties -> Format from other fields...

If you are using Windows, I would highly recommend using the freeware program: MP3Tag. It will easily do the renaming you request (and a whole lot more).

You can find the "middle pattern" by doing (-.*-) and replace with -. To remove the numbers you can use (\[[0-9]*\])

May be this could help -
sed 's/[[0-9]\+] \([A-Za-z ]*[^-]\) -.*[^-]- \(.*\)/\1-\2/'
Test:
[jaypal:~/temp] echo "[09] System Of A Down -Toxicity - 03 - Chop Suey.mp3" |
sed 's/[[0-9]\+] \([A-Za-z ]*[^-]\) -.*[^-]- \(.*\)/\1 - \2/'
System Of A Down - Chop Suey.mp3
OR
awk -F"-" '{print $1"-"$4}' | sed 's/[[0-9]\+] //g'
Test:
[jaypal:~/temp] echo "[09] System Of A Down -Toxicity - 03 - Chop Suey.mp3" |
awk -F"-" '{print $1"-"$4}' | sed 's/[[0-9]\+] //g'
System Of A Down - Chop Suey.mp3

Related

Convert Python Regex to Bash Regex

I am trying to write a bash script to convert files for streaming on the home network.
I am wondering if the community could recommend something that would allow me to use my existing regex to search a string for the presence of a pattern and replace the text following a pattern.
Part of this involves naming the file to include the quality, release year and episode information (if any of these are available).
I have some Python regex I am trying to convert to a bash regex search and replace.
There are a few options such as Sed, Grep or AWK but I am not sure what is best for my approach.
My existing python regex apparently uses an extended perl form of regex.
# Captures quality 1080p or 720p
determinedQuality = re.findall("[0-9]{3}[PpIi]{1}|[0-9]{4}[PpIi]{1}", next_line)
# Captures year (4 characters long and only numeric)
yearInitial = str(re.findall("[0-9]{4}[^A-Za-z]", next_line))
# Lazy programming on my part to clear up the string gathered from the year
determinedYear = re.findall("[0-9]{4}", yearInitial)
# If the string has either S00E00 or 1X99 present then its a TV show
determinedEpisode = re.findall("[Ss]{1}[0-9]{2}[Ee]{1}[0-9]{2}|[0-9]{1}[x]{1}[0-9]{2}", next_line)
My aim is to end up with a filename all in lowercase with underscores instead of spaces in the filename along with quality information if possible:
# Sample of desired file names
harry_potter_2001_720p_philosphers_stone.mkv
S01E05_fringe_1080p.mkv
I simplified the regexs, for example if you need 3 or 4 you can use {3,4} and {1} is redundant you can remove it.
#!/bin/bash
INPUT="harry_potter_2001_720p_philosphers_stone.mkv"
#INPUT="S01E05_fringe_1080p.mkv"
determinedQuality=$(echo "$INPUT" | grep -Po '[0-9]{3,4}[PpIi]')
determinedYear=$(echo "$INPUT" | grep -Po '[0-9]{4}[^A-Za-z]' | grep -Po '[0-9]{4}')
determinedEpisode=$(echo "$INPUT" | grep -Po '[Ss]{1}[0-9]{2}[Ee][0-9]{2}|[0-9]x[0-9]{2}')
echo "quality: $determinedQuality"
echo "year: $determinedYear"
echo "episode: $determinedEpisode"
output for first one:
quality: 720p
year: 2001
episode:
output for second one:
quality: 1080p
year:
episode: S01E05

sed search and replace only before a char

Is there a way to use sed (with potential other command) to transform all the keys in a file that lists key-values like that :
a.key.one-example=a_value_one
a.key.two-example=a_value_two
and I want that
A_KEY_ONE_EXAMPLE=a_value_one
A_KEY_TWO_EXAMPLE=a_value_two
What I did so far :
sed -e 's/^[^=]*/\U&/'
it produced this :
A.KEY.ONE-EXAMPLE=a_value_one
A.KEY.TWO-EXAMPLE=a_value_two
But I still need to replace the "." and "-" on left part of the "=". I don't think it is the right way to do it.
It should be done very easily done in awk. awk is the better tool IMHO for this task, it keeps it simple and easy.
awk 'BEGIN{FS=OFS="="} {$1=toupper($1);gsub(/[.-]/,"_",$1)} 1' Input_file
Simple explanation:
Make field separator and output field separator as =
Then use awk's default function named toupper which will make $1(first field) upper case and save it into $1 itself.
Using gsub to substitute . OR - with _ in $1 as per requirement.
use 1 which is idiomatic way to print a line in awk.
This might work for you (GNU sed):
sed -E 'h;y/.-/__/;s/.*/\U&/;G;s/=.*=/=/' file
Make a copy of the current line.
Translate . and - to _.
Capitalize the whole line.
Append the copy.
Remove the centre portion.
You can use
sed ':a;s/^\([^=]*\)[.-]\([^=]*\)/\U\1_\2/g;ta' file > newfile
Details:
:a - sets an a label
s/^\([^=]*\)[.-]\([^=]*\)/\U\1_\2/g - replaces ^\([^=]*\)[.-]\([^=]*\) pattern that matches
^ - start of string
\([^=]*\) - Group 1 (\1): any zero or more chars other than =
[.-] - a dot or hyphen
\([^=]*\) - Group 2 (\2): any zero or more chars other than =
ta - jumps back to a label position upon successful replacement
and replaces with Group 2 + _ + Group 1
See the online demo:
#!/bin/bash
s='a.key.one-example=a_value_one
a.key.two-example=a_value_two'
sed ':a;s/^\([^=]*\)[.-]\([^=]*\)/\U\1_\2/g;ta' <<< "$s"
Output:
A_KEY_ONE_EXAMPLE=a_value_one
A_KEY_TWO_EXAMPLE=a_value_two

sed / awk - remove space in file name

I'm trying to remove whitespace in file names and replace them.
Input:
echo "File Name1.xml File Name3 report.xml" | sed 's/[[:space:]]/__/g'
However the output
File__Name1.xml__File__Name3__report.xml
Desired output
File__Name1.xml File__Name3__report.xml
You named awk in the title of the question, didn't you?
$ echo "File Name1.xml File Name3 report.xml" | \
> awk -F'.xml *' '{for(i=1;i<=NF;i++){gsub(" ","_",$i); printf i<NF?$i ".xml ":"\n" }}'
File_Name1.xml File_Name3_report.xml
$
-F'.xml *' instructs awk to split on a regex, the requested extension plus 0 or more spaces
the loop {for(i=1;i<=NF;i++) is executed for all the fields in which the input line(s) is(are) splitted — note that the last field is void (it is what follows the last extension), but we are going to take that into account...
the body of the loop
gsub(" ","_", $i) substitutes all the occurrences of space to underscores in the current field, as indexed by the loop variable i
printf i<NF?$i ".xml ":"\n" output different things, if i<NF it's a regular field, so we append the extension and a space, otherwise i equals NF, we just want to terminate the output line with a newline.
It's not perfect, it appends a space after the last filename. I hope that's good enough...
▶    A D D E N D U M    ◀
I'd like to address:
the little buglet of the last space...
some of the issues reported by Ed Morton
generalize the extension provided to awk
To reach these goals, I've decided to wrap the scriptlet in a shell function, that changing spaces into underscores is named s2u
$ s2u () { awk -F'\.'$1' *' -v ext=".$1" '{
> NF--;for(i=1;i<=NF;i++){gsub(" ","_",$i);printf "%s",$i ext (i<NF?" ":"\n")}}'
> }
$ echo "File Name1.xml File Name3 report.xml" | s2u xml
File_Name1.xml File_Name3_report.xml
$
It's a bit different (better?) 'cs it does not special print the last field but instead special-cases the delimiter appended to each field, but the idea of splitting on the extension remains.
This seems a good start if the filenames aren't delineated:
((?:\S.*?)?\.\w{1,})\b
( // start of captured group
(?: // non-captured group
\S.*? // a non-white-space character, then 0 or more any character
)? // 0 or 1 times
\. // a dot
\w{1,} // 1 or more word characters
) // end of captured group
\b // a word boundary
You'll have to look-up how a PCRE pattern converts to a shell pattern. Alternatively it can be run from a Python/Perl/PHP script.
Demo
Assuming you are asking how to rename file names, and not remove spaces in a list of file names that are being used for some other reason, this is the long and short way. The long way uses sed. The short way uses rename. If you are not trying to rename files, your question is quite unclear and should be revised.
If the goal is to simply get a list of xml file names and change them with sed, the bottom example is how to do that.
directory contents:
ls -w 2
bob is over there.xml
fred is here.xml
greg is there.xml
cd [directory with files]
shopt -s nullglob
a_glob=(*.xml);
for ((i=0;i< ${#a_glob[#]}; i++));do
echo "${a_glob[i]}";
done
shopt -u nullglob
# output
bob is over there.xml
fred is here.xml
greg is there.xml
# then rename them
cd [directory with files]
shopt -s nullglob
a_glob=(*.xml);
for ((i=0;i< ${#a_glob[#]}; i++));do
# I prefer 'rename' for such things
# rename 's/[[:space:]]/_/g' "${a_glob[i]}";
# but sed works, can't see any reason to use it for this purpose though
mv "${a_glob[i]}" $(sed 's/[[:space:]]/_/g' <<< "${a_glob[i]}");
done
shopt -u nullglob
result:
ls -w 2
bob_is_over_there.xml
fred_is_here.xml
greg_is_there.xml
globbing is what you want here because of the spaces in the names.
However, this is really a complicated solution, when actually all you need to do is:
cd [your space containing directory]
rename 's/[[:space:]]/_/g' *.xml
and that's it, you're done.
If on the other hand you are trying to create a list of file names, you'd certainly want the globbing method, which if you just modify the statement, will do what you want there too, that is, just use sed to change the output file name.
If your goal is to change the filenames for output purposes, and not rename the actual files:
cd [directory with files]
shopt -s nullglob
a_glob=(*.xml);
for ((i=0;i< ${#a_glob[#]}; i++));do
echo "${a_glob[i]}" | sed 's/[[:space:]]/_/g';
done
shopt -u nullglob
# output:
bob_is_over_there.xml
fred_is_here.xml
greg_is_there.xml
You could use rename:
rename --nows *.xml
This will replace all the spaces of the xml files in the current folder with _.
Sometimes it comes without the --nows option, so you can then use a search and replace:
rename 's/[[:space:]]/__/g' *.xml
Eventually you can use --dry-run if you want to just print filenames without editing the names.

Remove character occurring after _ from all the files excluding file extension (.png)

I was searching for a unix command/shell script to remove characters occurred after _ in all the files excluding file extension.
Example:
b6d28-insurance-renewal-shop_6b5c74fa3d4b96f7557c3fd66f2555af.png
should be renamed to
b6d28-insurance-renewal-shop.png
I have tried searching online and but was not able to find out a quick and optimal solution.
Please note that those extra characters are added randomly and varying in each file.
Thanks in Advance!
You can use sed like this using a negated character class:
f='b6d28-insurance-renewal-shop_6b5c74fa3d4b96f7557c3fd66f2555af.png'
sed 's/_[^_.]*//' <<< "$f"
b6d28-insurance-renewal-shop.png
[^_.] matches any character except DOT or underscore.
If you're using bash then you can do this in shell itself using:
echo "${f%_*}.png"
You could also use cut for the result like this:
file="b6d28-insurance-renewal-shop_6b5c74fa3d4b96f7557c3fd66f2555af.png"
new_file=$(echo $file | cut -d'_' -f1).$(echo $file | cut -d'.' -f2)
echo "New file name: ${new_file}"
Output:
New file name: b6d28-insurance-renewal-shop.png
Regex pattern:
(\_[\d\w]+)(?=(\.\w{2,3}))
to find every _akfgasfhsgfhha before .ext[ension]
Assuming that f holds the original filename,
${f%_*}.${f##*.}
would give you the transformed filename.

How to swap text based on patterns at once with sed?

Suppose I have 'abbc' string and I want to replace:
ab -> bc
bc -> ab
If I try two replaces the result is not what I want:
echo 'abbc' | sed 's/ab/bc/g;s/bc/ab/g'
abab
So what sed command can I use to replace like below?
echo abbc | sed SED_COMMAND
bcab
EDIT:
Actually the text could have more than 2 patterns and I don't know how many replaces I will need. Since there was a answer saying that sed is a stream editor and its replaces are greedily I think that I will need to use some script language for that.
Maybe something like this:
sed 's/ab/~~/g; s/bc/ab/g; s/~~/bc/g'
Replace ~ with a character that you know won't be in the string.
I always use multiple statements with "-e"
$ sed -e 's:AND:\n&:g' -e 's:GROUP BY:\n&:g' -e 's:UNION:\n&:g' -e 's:FROM:\n&:g' file > readable.sql
This will append a '\n' before all AND's, GROUP BY's, UNION's and FROM's, whereas '&' means the matched string and '\n&' means you want to replace the matched string with an '\n' before the 'matched'
sed is a stream editor. It searches and replaces greedily. The only way to do what you asked for is using an intermediate substitution pattern and changing it back in the end.
echo 'abcd' | sed -e 's/ab/xy/;s/cd/ab/;s/xy/cd/'
Here is a variation on ooga's answer that works for multiple search and replace pairs without having to check how values might be reused:
sed -i '
s/\bAB\b/________BC________/g
s/\bBC\b/________CD________/g
s/________//g
' path_to_your_files/*.txt
Here is an example:
before:
some text AB some more text "BC" and more text.
after:
some text BC some more text "CD" and more text.
Note that \b denotes word boundaries, which is what prevents the ________ from interfering with the search (I'm using GNU sed 4.2.2 on Ubuntu). If you are not using a word boundary search, then this technique may not work.
Also note that this gives the same results as removing the s/________//g and appending && sed -i 's/________//g' path_to_your_files/*.txt to the end of the command, but doesn't require specifying the path twice.
A general variation on this would be to use \x0 or _\x0_ in place of ________ if you know that no nulls appear in your files, as jthill suggested.
Here is an excerpt from the SED manual:
-e script
--expression=script
Add the commands in script to the set of commands to be run while processing the input.
Prepend each substitution with -e option and collect them together. The example that works for me follows:
sed < ../.env-turret.dist \
-e "s/{{ name }}/turret$TURRETS_COUNT_INIT/g" \
-e "s/{{ account }}/$CFW_ACCOUNT_ID/g" > ./.env.dist
This example also shows how to use environment variables in your substitutions.
This might work for you (GNU sed):
sed -r '1{x;s/^/:abbc:bcab/;x};G;s/^/\n/;:a;/\n\n/{P;d};s/\n(ab|bc)(.*\n.*:(\1)([^:]*))/\4\n\2/;ta;s/\n(.)/\1\n/;ta' file
This uses a lookup table which is prepared and held in the hold space (HS) and then appended to each line. An unique marker (in this case \n) is prepended to the start of the line and used as a method to bump-along the search throughout the length of the line. Once the marker reaches the end of the line the process is finished and is printed out the lookup table and markers being discarded.
N.B. The lookup table is prepped at the very start and a second unique marker (in this case :) chosen so as not to clash with the substitution strings.
With some comments:
sed -r '
# initialize hold with :abbc:bcab
1 {
x
s/^/:abbc:bcab/
x
}
G # append hold to patt (after a \n)
s/^/\n/ # prepend a \n
:a
/\n\n/ {
P # print patt up to first \n
d # delete patt & start next cycle
}
s/\n(ab|bc)(.*\n.*:(\1)([^:]*))/\4\n\2/
ta # goto a if sub occurred
s/\n(.)/\1\n/ # move one char past the first \n
ta # goto a if sub occurred
'
The table works like this:
** ** replacement
:abbc:bcab
** ** pattern
Tcl has a builtin for this
$ tclsh
% string map {ab bc bc ab} abbc
bcab
This works by walking the string a character at a time doing string comparisons starting at the current position.
In perl:
perl -E '
sub string_map {
my ($str, %map) = #_;
my $i = 0;
while ($i < length $str) {
KEYS:
for my $key (keys %map) {
if (substr($str, $i, length $key) eq $key) {
substr($str, $i, length $key) = $map{$key};
$i += length($map{$key}) - 1;
last KEYS;
}
}
$i++;
}
return $str;
}
say string_map("abbc", "ab"=>"bc", "bc"=>"ab");
'
bcab
May be a simpler approach for single pattern occurrence you can try as below:
echo 'abbc' | sed 's/ab/bc/;s/bc/ab/2'
My output:
~# echo 'abbc' | sed 's/ab/bc/;s/bc/ab/2'
bcab
For multiple occurrences of pattern:
sed 's/\(ab\)\(bc\)/\2\1/g'
Example
~# cat try.txt
abbc abbc abbc
bcab abbc bcab
abbc abbc bcab
~# sed 's/\(ab\)\(bc\)/\2\1/g' try.txt
bcab bcab bcab
bcab bcab bcab
bcab bcab bcab
Hope this helps !!
echo "C:\Users\San.Tan\My Folder\project1" | sed -e 's/C:\\/mnt\/c\//;s/\\/\//g'
replaces
C:\Users\San.Tan\My Folder\project1
to
mnt/c/Users/San.Tan/My Folder/project1
in case someone needs to replace windows paths to Windows Subsystem for Linux(WSL) paths
If replacing the string by Variable, the solution doesn't work.
The sed command need to be in double quotes instead on single quote.
#sed -e "s/#replacevarServiceName#/$varServiceName/g" -e "s/#replacevarImageTag#/$varImageTag/g" deployment.yaml
Here is an awk based on oogas sed
echo 'abbc' | awk '{gsub(/ab/,"xy");gsub(/bc/,"ab");gsub(/xy/,"bc")}1'
bcab
I believe this should solve your problem. I may be missing a few edge cases, please comment if you notice one.
You need a way to exclude previous substitutions from future patterns, which really means making outputs distinguishable, as well as excluding these outputs from your searches, and finally making outputs indistinguishable again. This is very similar to the quoting/escaping process, so I'll draw from it.
s/\\/\\\\/g escapes all existing backslashes
s/ab/\\b\\c/g substitutes raw ab for escaped bc
s/bc/\\a\\b/g substitutes raw bc for escaped ab
s/\\\(.\)/\1/g substitutes all escaped X for raw X
I have not accounted for backslashes in ab or bc, but intuitively, I would escape the search and replace terms the same way - \ now matches \\, and substituted \\ will appear as \.
Until now I have been using backslashes as the escape character, but it's not necessarily the best choice. Almost any character should work, but be careful with the characters that need escaping in your environment, sed, etc. depending on how you intend to use the results.
Every answer posted thus far seems to agree with the statement by kuriouscoder made in his above post:
The only way to do what you asked for is using an intermediate
substitution pattern and changing it back in the end
If you are going to do this, however, and your usage might involve more than some trivial string (maybe you are filtering data, etc.), the best character to use with sed is a newline. This is because since sed is 100% line-based, a newline is the one-and-only character you are guaranteed to never receive when a new line is fetched (forget about GNU multi-line extensions for this discussion).
To start with, here is a very simple approach to solving your problem using newlines as an intermediate delimiter:
echo "abbc" | sed -E $'s/ab|bc/\\\n&/g; s/\\nab/bc/g; s/\\nbc/ab/g'
With simplicity comes some trade-offs... if you had more than a couple variables, like in your original post, you have to type them all twice. Performance might be able to be improved a little bit, too.
It gets pretty nasty to do much beyond this using sed. Even with some of the more advanced features like branching control and the hold buffer (which is really weak IMO), your options are pretty limited.
Just for fun, I came up with this one alternative, but I don't think I would have any particular reason to recommend it over the one from earlier in this post... You have to essentially make your own "convention" for delimiters if you really want to do anything fancy in sed. This is way-overkill for your original post, but it might spark some ideas for people who come across this post and have more complicated situations.
My convention below was: use multiple newlines to "protect" or "unprotect" the part of the line you're working on. One newline denotes a word boundary. Two newlines denote alternatives for a candidate replacement. I don't replace right away, but rather list the candidate replacement on the next line. Three newlines means that a value is "locked-in", like your original post way trying to do with ab and bc. After that point, further replacements will be undone, because they are protected by the newlines. A little complicated if I don't say so myself... ! sed isn't really meant for much more than the basics.
# Newlines
NL=$'\\\n'
NOT_NL=$'[\x01-\x09\x0B-\x7F]'
# Delimiters
PRE="${NL}${NL}&${NL}"
POST="${NL}${NL}"
# Un-doer (if a request was made to modify a locked-in value)
tidy="s/(\\n\\n\\n${NOT_NL}*)\\n\\n(${NOT_NL}*)\\n(${NOT_NL}*)\\n\\n/\\1\\2/g; "
# Locker-inner (three newlines means "do not touch")
tidy+="s/(\\n\\n)${NOT_NL}*\\n(${NOT_NL}*\\n\\n)/\\1${NL}\\2/g;"
# Finalizer (remove newlines)
final="s/\\n//g"
# Input/Commands
input="abbc"
cmd1="s/(ab)/${PRE}bc${POST}/g"
cmd2="s/(bc)/${PRE}ab${POST}/g"
# Execute
echo ${input} | sed -E "${cmd1}; ${tidy}; ${cmd2}; ${tidy}; ${final}"