regexp for renaming multiple files using 'rename.pl' - regex

I'm using rename.pl to rename a multiple files. I am having trouble coming up with the right regexp. My file names are of the form:
nn.some.title.string.ext
I want to change just the first '.' to ' - '. I thought this would work but it does not.
s/\.?/ - /
Can someone help me out with this? TIA.

\.? can match a sequence of zero characters, so s/\.?/ - / replaces the dot or the empty string at the start of the input.
"abc.def.ghi" ⇒ " - abc.def.ghi"
".abc" ⇒ " - abc"
To replace the first ., you can use the following:
s/\./ - /
"abc.def.ghi" ⇒ "abc - def.ghi"
To substitute all . but a leading one or the one in the extension, you can use the following:
s/(?!^)\.(?!\w+\z)/ - /g

Probably you will want to make sure that firs point is not the last. I mean if by any chance you will have nn_some_title_string.ext file name the script will not change a last dot.
$fileName = "nn.some.title.string.ext";
$fileName =~s/\.(?=\w+\.\w+)/-/;
print "FileName: " . $fileName ."\n";

Just try this for simple regex pattern.
my $str = "nn.some.title.string.ext";
$str=~s/^([^\.]*)\./$1-/i;
or
$str=~s/\./-/;
#^([^\.]*) Starting point upto the first dot
print $str;

Related

Regex match until third occurrence of a char is found, counting occurrence of said char starting from the end of string

Let's dive in : Input :
p9_rec_tonly_.cr_called.seg
p9_tonly_.cr_called.seg
p10_nor_nor_.cr_called.seg
p10_rec_tn_.cr_called.seg
p10_tn_.cr_called.seg
p26_rec_nor_nor_.cr_called.seg
p26_rec_tn_.cr_called.seg
p26_tn_.cr_called.seg
Desired output :
p9_rec
p9
p10_nor
p10_rec
p10
p26_rec_nor
p26_rec
p26
Starting from the beginning of my string, I need to match until the third occurrence of " _ " (underscore) is found, but I need to count " _ " (underscore) occurrence starting from end of string.
Any tips is appreciated,
Best regards
I believe this regex should do the trick!
^.*?(?=_[^_]*_[^_]*_[^_]*$)
Online Demo
Explanation:
^ the start of the line
.*? matches as many characters as possible
(?=...) asserts that its contents follow our match
_[^_]*_[^_]*_[^_]* Looks for exactly three underscores after our match.
$ the end of the line
You should think beyond regex to solve this problem. For example, if you are using Python just use rsplit with a limit of 3 and get the first resulting string:
>>> data = [
'p9_rec_tonly_.cr_called.seg',
'p9_tonly_.cr_called.seg',
'p10_nor_nor_.cr_called.seg',
'p10_rec_tn_.cr_called.seg',
'p10_tn_.cr_called.seg',
'p26_rec_nor_nor_.cr_called.seg',
'p26_rec_tn_.cr_called.seg',
'p26_tn_.cr_called.seg',
]
>>> for d in data:
print(d.rsplit('_', 3)[0])
p9_rec
p9
p10_nor
p10_rec
p10
p26_rec_nor
p26_rec
p26
bash you say? Well it's not a regular expression but you can do pattern substitutions (or stripping with bash):
while read var ; do echo ${var%_*_*_*} ; done <<EOT
p9_rec_tonly_.cr_called.seg
p9_tonly_.cr_called.seg
p10_nor_nor_.cr_called.seg
p10_rec_tn_.cr_called.seg
p10_tn_.cr_called.seg
p26_rec_nor_nor_.cr_called.seg
p26_rec_tn_.cr_called.seg
p26_tn_.cr_called.seg
EOT
${var%_*_*_*} expands variable var stripping shorted suffix match for _*_*_*.
Otherwise to perform regex operations in shell, you could normally ask a utility like sed for help and feed your lines through for instance this:
sed -e 's#_[^_]*_[^_]*_[^_]*$##'
or for short:
sed -e 's#\(_[^_]*\)\{3\}$##'
Find three groups of _ and zero or more characters of not _ at the end of line $ replacing them with nothing ('').

(simple) AHK: RegexMatch "\n[^\n]$" doesn't work

What am I doing wrong here?
Shells := "`nAlpha`nBetta`nOmega"
RegexMatch(Shells, "\n[^\n]$", LastLetter)
MsgBox % "The last letter is: " . LastLetter
The last letter should be Omega, but it doesn't happen so in my case.
EDIT:
1) "`n" is a single LineFeed character.
2) LastLetter is (the name of) a variable that must contain string "`nOmega".
You have to use a quantifier in addition to \z token (I'm not sure how multi-line mode is handled in AutoHotKey regex engine but you can leave $ intact if multi-line mode is off by default):
RegexMatch(Shells, "\n[^\n]*\z", LastLetter)

perl Regex replace for specific string length

I am using Perl to do some prototyping.
I need an expression to replace e by [ee] if the string is exactly 2 chars and finishes by "e".
le -> l [ee]
me -> m [ee]
elle -> elle : no change
I cannot test the length of the string, I need one expression to do the whole job.
I tried:
`s/(?=^.{0,2}\z).*e\z%/[ee]/g` but this is replacing the whole string
`s/^[c|d|j|l|m|n|s|t]e$/[ee]/g` same result (I listed the possible letters that could precede my "e")
`^(?<=[c|d|j|l|m|n|s|t])e$/[ee]/g` but I have no match, not sure I can use ^ on a positive look behind
EDIT
Guys you're amazing, hours of search on the web and here I get answers minutes after I posted.
I tried all your solutions and they are working perfectly directly in my script, i.e. this one:
my $test2="le";
$test2=~ s/^(\S)e$/\1\[ee\]/g;
print "test2:".$test2."\n";
-> test2:l[ee]
But I am loading these regex from a text file (using Perl for proto, the idea is to reuse it with any language implementing regex):
In the text file I store for example (I used % to split the line between match and replace):
^(\S)e$% \1\[ee\]
and then I parse and apply all regex like that:
my $test="le";
while (my $row = <$fh>) {
chomp $row;
if( $row =~ /%/){
my #reg = split /%/, $row;
#if no replacement, put empty string
if($#reg == 0){
push(#reg,"");
}
print "reg found, reg:".$reg[0].", replace:".$reg[1]."\n";
push #regs, [ #reg ];
}
}
print "orgine:".$test."\n";
for my $i (0 .. $#regs){
my $p=$regs[$i][0];
my $r=$regs[$i][1];
$test=~ s/$p/$r/g;
}
print "final:".$test."\n";
This technique is working well with my other regex, but not yet when I have a $1 or \1 in the replace... here is what I am obtaining:
final:\1\ee\
PS: you answered to initial question, should I open another post ?
Something like s/(?i)^([a-z])e$/$1[ee]/
Why aren't you using a capture group to do the replacement?
`s/^([c|d|j|l|m|n|s|t])e$/\1 [ee]/g`
If those are the characters you need and if it is indeed one word to a line with no whitespace before it or after it, then this will work.
Here's another option depending on what you are looking for. It will match a two character string consisting of one a-z character followed by one 'e' on its own line with possible whitespace before or after. It will replace this will the single a-z character followed by ' [ee]'
`s/^\s*([a-z])e\s*$/\1 [ee]/`
^(\S)e$
Try this.Replace by $1 [ee].See demo.
https://regex101.com/r/hR7tH4/28
I'd do something like this
$word =~ s/^(\w{1})(e)$/$1$2e/;
You can use following regex which match 2 character and then you can replace it with $1\[$2$2\]:
^([a-zA-Z])([a-zA-Z])$
Demo :
$my_string =~ s/^([a-zA-Z])([a-zA-Z])$/$1[$2$2]/;
See demo https://regex101.com/r/iD9oN4/1

Concatenate regex s+ w+ ... perl

I have entries like that :
XYZABC------------HGTEZCW
ZERTAE------------RCBCVQE
I would like to get just HGTEZCW and RCBCVQE .
I would like to use a generic regex.
$temp=~ s/^\s+//g; (1)
$temp=~ s/^\w+[-]+//g; (2)
If i use (1) + (2) , it works.
It works i get : HGTEZCW, then RCBCVQE ...
I would like to know if it is possible to do that in one line like :
$temp=~ s/^\s+\w+[-]+//g; (3)
When I use (3), i get this result : XYZABC------------HGTEZCW
I dont understand why it is not possible to concat 1 + 2 in one line.
Sorry my entries was :
XYZABC------------HGTEZCW
ZERTAE------------RCBCVQE
Also, the regex 1 remove space but when i use regex2, it remove XYZABC------------ .
But the combination (3), don't work.
i have this XYZABC------------HGTEZCW
#Tim So there always is whitespace at the start of each string?
yes
Your regex (1) removes whitespace from the start of the string. So it does nothing on your example strings.
Reges (2) removes all alphanumerics from the start of the string plus any following dashes, returning whatever follows the last dash.
If you combine both, the regex fails because there is no whitespace \s+ could match - therefore the entire regex fails.
To fix this, simply make the whitespace optional. Also you don't need to enclose the - in brackets:
$temp=~ s/^\s*\w+-+//g;
This should do the trick.
$Str = '
XYZABC------------HGTEZCW
ZERTAE------------RCBCVQE
';
#Matches = ($Str =~ m#^.+-(\w+)$#mg);
print join "\n",#Matches ;
If you only need the last seven characters of each entry, you could do the following:
$temp =~ /.{7}$/;

perl regex grouping overload

I am using the following perl regex lines
$myalbum =~ s/[-_'&’]/ /g;
$myalbum =~ s/[,’.]//g;
$myalbum =~ m/([A-Z0-9\$]+) +([A-Z0-9\$]+) +([A-Z0-9\$]+) +([A-Z0-9\$]+) +([A-Z0-9\$]+)/i;
to match the following strings
"30_Seconds_To_Mars_-_30_Seconds_To_Mars"
"30_Seconds_To_Mars_-_A_Beautiful_Lie"
"311_-_311"
"311_-_From_Chaos"
"311_-_Grassroots"
"311_-_Sound_System"
What I am experiencing is that for strings with less than 5 matching groups (ex. 311_-_311), attempting to print $1 $2 $3 prints nothing at all. Only strings with more than 5 matches will print.
How do I resolve this?
It looks like you just want the words in separate groups. To me, it seems like you're abusing regexes to do that when you could just run your substitutions and then split. Just do:
$myalbum =~ s/[-_'&’]/ /g;
$myalbum =~ s/[,’.]//g;
my #myalbum_list = split(/\s/, $myalbum);
#Print out whatever it is you want/ test length, etc...
print "$myalbum_list[0] $myalbum_list[1] $myalbum_list[2]";
the + character means at least one match. Which means your regex m/([A-Z0-9\$]+) +([A-Z0-9\$]+) + ... requires all those fields to be there for it to be considered a match. The reason you are not capturing anything is because it's not actually matching.
You are probably looking for the * character which means zero or more not one or more like +.
I suppose your capturing groups are empty for "311 - 311" because this string doesn't match your regex.
How to resolve? Use * instead of + to permit empty sequences.
Edit: From your post I guess you want to extract the album name, i.e. the part before the minus sign.
Why not match against '(.*) - (.*)', being the first group the album and the second the title. The problem is with strings like "Album with minus - sign - First track" or "My Album - Track is one - two - three". But also as a human you wouldn't know there where the album ends and the track starts.