Regex to alter filename deliminators - regex

I'm fairly new to regex, I can write expressions to do most simple file renaming jobs now but this one has me stuck.
I'm just trying to change the deliminator in a bunch of filenames from " -" to " - ", some examples:
"Author Name -Series 00 -Title.txt" needs to become:
"Author Name - Series 00 - Title.txt"
"Author_Name -[Series 01] -Title -Genre.txt" needs to become:
"Author_Name - [Series 01] - Title - Genre.txt"
The expression needs to be able to cope with 1, 2 or 3 " -" deliminators, and must ignore all other hyphens, for example "-" "- " and existing " - " should all be ignored. For example:
"File_Name1 - Sometext- more-info (V1.0).txt" Should not be changed at all.
It's for use in File Renamer, which is in Python.

You can use a positive look-ahead, search with the following pattern and replace it afterwards with the correct characters. There is a space in the beginning of the pattern. You can also use the white space selector \s.
-(?=[^ ])
or with the whitespace character \s:
\s-(?=[^ ])
Here is an example to test the pattern in JavaScript:
// expected:
// "Author Name -Series 00 -Title.txt" ->
// "Author Name - Series 00 - Title.txt"
// "Author_Name -[Series 01] -Title -Genre.txt" ->
// "Author_Name - [Series 01] - Title - Genre.txt"
// "File_Name1 - Sometext- more-info (V1.0).txt" ->
// no change
var regex = / -(?=[^ ])/g;
var texts = [
"Author Name -Series 00 -Title.txt",
"Author_Name -[Series 01] -Title -Genre.txt",
"File_Name1 - Sometext- more-info (V1.0).txt"
];
for(var i = 0; i < texts.length; i++) {
var text = texts[i];
console.log(text, "->", text.replace(regex, ' - '));
}

Related

Remove only non-leading and non-trailing spaces from a string in Ruby?

I'm trying to write a Ruby method that will return true only if the input is a valid phone number, which means, among other rules, it can have spaces and/or dashes between the digits, but not before or after the digits.
In a sense, I need a method that does the opposite of String#strip! (remove all spaces except leading and trailing spaces), plus the same for dashes.
I've tried using String#gsub!, but when I try to match a space or a dash between digits, then it replaces the digits as well as the space/dash.
Here's an example of the code I'm using to remove spaces. I figure once I know how to do that, it will be the same story with the dashes.
def valid_phone_number?(number)
phone_number_pattern = /^0[^0]\d{8}$/
# remove spaces
number.gsub!(/\d\s+\d/, "")
return number.match?(phone_number_pattern)
end
What happens is if I call the method with the following input:
valid_phone_number?(" 09 777 55 888 ")
I get false because line 5 transforms the number into " 0788 ", i.e. it gets rid of the digits around the spaces as well as the spaces. What I want it to do is just to get rid of the inner spaces, so as to produce " 0977755888 ".
I've tried
number.gsub!(/\d(\s+)\d/, "") and number.gsub!(/\d(\s+)\d/) { |match| "" } to no avail.
Thank you!!
If you want to return a boolean, you might for example use a pattern that accepts leading and trailing spaces, and matches 10 digits (as in your example data) where there can be optional spaces or hyphens in between.
^ *\d(?:[ -]?\d){9} *$
For example
def valid_phone_number?(number)
phone_number_pattern = /^ *\d(?:[ -]*\d){9} *$/
return number.match?(phone_number_pattern)
end
See a Ruby demo and a regex demo.
To remove spaces & hyphen inbetween digits, try:
(?:\d+|\G(?!^)\d+)\K[- ]+(?=\d)
See an online regex demo
(?: - Open non-capture group;
d+ - Match 1+ digits;
| - Or;
\G(?!^)\d+ - Assert position at end of previous match but (negate start-line) with following 1+ digits;
)\K - Close non-capture group and reset matching point;
[- ]+ - Match 1+ space/hyphen;
(?=\d) - Assert position is followed by digits.
p " 09 777 55 888 ".gsub(/(?:\d+|\G(?!^)\d+)\K[- ]+(?=\d)/, '')
Prints: " 0977755888 "
Using a very simple regex (/\d/ tests for a digit):
str = " 09 777 55 888 "
r = str.index(/\d/)..str.rindex(/\d/)
str[r] = str[r].delete(" -")
p str # => " 0977755888 "
Passing a block to gsub is an option, capture groups available as globals:
>> str = " 09 777 55 888 "
# simple, easy to understand
>> str.gsub(/(^\s+)([\d\s-]+?)(\s+$)/){ "#$1#{$2.delete('- ')}#$3" }
=> " 0977755888 "
# a different take on #steenslag's answer, to avoid using range.
>> s = str.dup; s[/^\s+([\d\s-]+?)\s+$/, 1] = s.delete("- "); s
=> " 0977755888 "
Benchmark, not that it matters that much:
n = 1_000_000
puts(Benchmark.bmbm do |x|
# just a match
x.report("match") { n.times {str.match(/^ *\d(?:[ -]*\d){9} *$/) } }
# use regex in []=
x.report("[//]=") { n.times {s = str.dup; s[/^\s+([\d\s-]+?)\s+$/, 1] = s.delete("- "); s } }
# use range in []=
x.report("[..]=") { n.times {s = str.dup; r = s.index(/\d/)..s.rindex(/\d/); s[r] = s[r].delete(" -"); s } }
# block in gsub
x.report("block") { n.times {str.gsub(/(^\s+)([\d\s-]+?)(\s+$)/){ "#$1#{$2.delete('- ')}#$3" }} }
# long regex
x.report("regex") { n.times {str.gsub(/(?:\d+|\G(?!^)\d+)\K[- ]+(?=\d)/, "")} }
end)
Rehearsal -----------------------------------------
match 0.997458 0.000004 0.997462 ( 0.998003)
[//]= 1.822698 0.003983 1.826681 ( 1.827574)
[..]= 3.095630 0.007955 3.103585 ( 3.105489)
block 3.515401 0.003982 3.519383 ( 3.521392)
regex 4.761748 0.007967 4.769715 ( 4.772972)
------------------------------- total: 14.216826sec
user system total real
match 1.031670 0.000000 1.031670 ( 1.032347)
[//]= 1.859028 0.000000 1.859028 ( 1.860013)
[..]= 3.074159 0.003978 3.078137 ( 3.079825)
block 3.751532 0.011982 3.763514 ( 3.765673)
regex 4.634857 0.003972 4.638829 ( 4.641259)

RegEx to match with single occurrence of dash anywhere in [A-Z0-9]+ with total occurrence of 20 chars

I couldn't figure out a regex to match with single occurrence of dash anywhere in [A-Z0-9]+ with max occurrence of 20 chars, so it's like - and [A-Z0-9]+ altogether max 20 chars.
This is the closest pattern I can get but didn't work
([A-Z0-9]{1,19}|\-{1})
Why use a regex, especially a single regex? These conditions are much easier to check separately.
For example, using Perl:
if (length($str) <= 20 && $str =~ /\A[A-Z0-9]*-[A-Z0-9]*\z/)
Another option is to use a positive lookahead and assert the length to 1 - 20 chars:
^(?=.{1,20}$)[A-Z0-9]*-[A-Z0-9]*$
Depending on the tool or language, if you want to use different anchors than ^ and $ to match the start and end of the string or line you might look at this page.
For example:
let pattern = /^(?=.{1,20}$)[A-Z0-9]*-[A-Z0-9]*$/;
[
"AAAAAAAAAA-AAAAAAAAA",
"-",
"A-A",
"-A",
"A-",
"A",
"AAAAAAAAAAA-AAAAAAAAA",
"AAAAAAAAAAAAAAAAAAAA",
].forEach(s => {
if (pattern.test(s)) {
console.log("Match: '" + s + "' (Nr of chars: " + s.length + ")");
} else {
console.log("No match: '" + s + "' (Nr of chars: " + s.length + ")");
}
});

Specific VBA / VBScript regex related issue (MS Access)

UPDATE: March 12, 2018 1:44pm CST
After creating a website called http://vbfiddle.net to implement and test #ctwheel's VBScript solution in a browser (MS IE 10+ with Security set to "Medium", instructions on that website for how to set it up for you to play with should you want -- get the code to copy & paste into vbfiddle.net from this link at jsfiddle.net: https://jsfiddle.net/hcwjhmg9/ [vbfiddle.net does not currently have a "save" feature] ), I found that #ctwheel's VBScript RegEx ran successfully, even for the 3rd example line I gave, but when #ctwheel's VBScript RegEx is used in VBScript for VBA for Microsoft Access 2016 against a record read from a database with the "same" value, the third subgroup only returns "Ray," for the 3rd example line I gave, when it should return "Ray, CFP" like it does in vbfiddle.net.
It finally occurred to me to iterate through every character of the string value returned by the database (in VBA in Microsoft Access), and compare it to an iteration of every character of the visually-equivalent string value I type directly into the code (in VBA in Microsoft Access). I get the following results:
First Name and Last Name: "G.L. (Pete) Ray, CFP"
--- 1st Text chars: "71 46 76 46 32 40 80 101 116 101 41 32 82 97 121 44"
(Read value from database, appears same as below when Debug.Print is called on it)
--- 2nd Text chars: "71 46 76 46 32 40 80 101 116 101 41 32 82 97 121 44 32 67 70 80" (Typed by keyboard into a string within the code)
'G.L. (Pete) Ray,'
strProperName>objSubMatch: "G.L."
strProperName>objSubMatch: "Pete"
strProperName>objSubMatch: "Ray,"
Matching record(s): 3 of 1132 record(s).
The RegEx I'm running is running against the "1st Text Chars" example, and returns "Ray," for the 3rd subgroup of the previously given 3rd example line: "G.L. (Pete) Ray, CFP". However, if I run the RegEx against the 2nd -- typed directly into code -- "2nd Text chars" example, the 3rd subgroup returns "Ray, CFP" as expected in VBA for Microsoft Access 2016.
I'm now using the RegEx that #ctwheels provided:
^([^(]+?)\s*\(\s*([^)]*?)\s*\)\s*(.*)
Can someone explain what's going on here? 1) Why are the characters returned from the database different from the characters returned from typing the string using a keyboard by reading and copying it visually? 2) How do I make a RegEx that works on the "1st Text Chars" sequence of characters / string return the correct 3rd subgroup: "Ray, CFP" when the value is read directly from the database?
ORIGINAL QUESTION (updated question above):
I'm having problems in VBA using Microsoft Access 2016 with Regex Engine I believe 5.5 for VBScript.
This is the regex expression I'm currently using:
"(.*)\((.*)(\))(.*)"
I'm trying to parse the strings (respectively on each new line):
Lawrence N. (Larry) Solomon
James ( Jim ) Alleman
G.L. (Pete) Ray, CFP
Into:
"Lawrence N.", "Larry", ")", "Solomon"
"James", "Jim", ")", "Alleman"
"G.L.", "Pete", ")", "Ray, CFP"
Or alternatively (and preferably) into:
"Lawrence N.", "Larry", "Solomon"
"James", "Jim", "Alleman"
"G.L.", "Pete", "Ray, CFP"
where the parts within the quotes, separated by commas, are those returned in the submatches (without quotes)
I am using the following code:
' For Proper Name (strProperName):
With objRegex
.Global = False
.MultiLine = False
.IgnoreCase = True
.Pattern = "(.*)\((.*)(\))(.*)"
'([\s|\S]*) work around to match every character?
'".*\(([^\s]*)[\s]*\(" '_
''& "[\"
'[\(][\s]*([.|^\w]*)[\s]*\)"
' "[\s]*(.*)[\s]*\("
' does same as below except matches any or no whitespace preceding any characters,
' and returns the following characters up to an opening parenthesis ("(") but excluding it,
' as the first subgroup
' "(.*)[\s]*\("
' does same as below except matches any whitespace or no whitespace at all followed by an opening parenthesis ("(")
' and returns the preceding characters as the first subgroup
' "(.*)\("
' matches all characters in a row that end with an open parenthesis, and returns all of these characters in a row
' excluding the following open parenthesis as the first subgroup
' "(.*?\(\s)"
' "[^\(]*"
' this pattern returns every character that isn't an opening parenthesis ("("), and when
' it matches an open parenthesis, it does not return it or any characters after it
' "[\(\s](.*)\)"
' this pattern extracts everything between parenthesis in a line as its first submatch
' "(?<=\().*"
' "[^[^\(]*][.*]"
' "(\(.*?\))"
' "(\(.*?\))*([^\(].*[^\)])"
End With
If objRegex.Test(strFirstNameTrimmed) Then
'Set strsMatches = objRegex.Execute(rs.Fields("First Name"))
Set strsMatches = objRegex.Execute(strFirstNameTrimmed)
Debug.Print "2:'" & strsMatches(0).Value & "'"
If strsMatches(0).SubMatches.Count > 0 Then
For Each objSubMatch In strsMatches(0).SubMatches
Debug.Print " strProperName>objSubMatch: """ & objSubMatch & """" 'Result: 000, 643, 888"
strProperName = objSubMatch
Next objSubMatch
End If
Else
strProperName = "*Not Matched*"
End If
Produces the following output in the debug window / "Immediate Window" as it's known in VBA, brought up by (Ctrl+G):
------------------------
First Name and Last Name: "Lawrence N. (Larry) Solomon"
2:'Lawrence N. (Larry)'
strProperName>objSubMatch: "Lawrence N. "
strProperName>objSubMatch: "Larry"
strProperName>objSubMatch: ")"
strProperName>objSubMatch: ""
Extracted Nick Name: "Larry"
Extracted Proper Name: ""
First Name and Last Name: "James ( Jim ) Alleman"
2:'James ( Jim )'
strProperName>objSubMatch: "James "
strProperName>objSubMatch: " Jim "
strProperName>objSubMatch: ")"
strProperName>objSubMatch: ""
Extracted Nick Name: "Jim"
Extracted Proper Name: ""
First Name and Last Name: "G.L. (Pete) Ray, CFP"
2:'G.L. (Pete) Ray,'
strProperName>objSubMatch: "G.L. "
strProperName>objSubMatch: "Pete"
strProperName>objSubMatch: ")"
strProperName>objSubMatch: " Ray,"
Extracted Nick Name: "Pete"
Extracted Proper Name: " Ray,"
Matching record(s): 3 of 1132 record(s).
See regex in use here
^([^(]+?)\s*\(\s*([^)]*?)\s*\)\s*(.*)
^ Assert position at the start of the line
([^(]+?) Capture any character except ( one or more times, but as few as possible, into capture group 1
\s* Match any number of whitespace characters
\( Match ( literally
\s* Match any number of whitespace characters
([^)]*?) Capture any character except ) one or more times, but as few as possible, into capture group 2
\s* Match any number of whitespace characters
\( Match ( literally
\s* Match any number of whitespace characters
(.*) Capture the rest of the line into capture group 3
Results in:
["Lawrence N.", "Larry", "Solomon"]
["James", "Jim", "Alleman"]
["G.L.", "Pete", "Ray, CFP"]
You should be able to avoid using Regex, if that's your thing.
I made some assumptions about the test data that the nickname is contained within "()". Other than that the code should be straightforward, I hope. If not, feel free to ask a question. There is a Test routine called Test included too.
Public Function ParseString(InputString As String) As String
On Error GoTo ErrorHandler:
Dim OutputArray As Variant
Const DoubleQuote As String = """"
'Quick exit, if () aren't found, then just return original text
If InStr(1, InputString, "(") = 0 Or InStr(1, InputString, ")") = 0 Then
ParseString = InputString
Exit Function
End If
'Replace the ) with (, then do a split
OutputArray = Split(Replace(InputString, ")", "("), "(")
'Check the array bounds and output accordingly
'If there can only ever be 3 (0 - 2) elements, then you can change this if statement
If UBound(OutputArray) = 2 Then
ParseString = DoubleQuote & Trim$(OutputArray(0)) & DoubleQuote & ", " & _
DoubleQuote & Trim$(OutputArray(1)) & DoubleQuote & ", " & _
DoubleQuote & Trim$(OutputArray(2)) & DoubleQuote
ElseIf UBound(OutputArray) = 1 Then
ParseString = DoubleQuote & Trim$(OutputArray(0)) & DoubleQuote & ", " & _
DoubleQuote & Trim$(OutputArray(1)) & DoubleQuote
Else
ParseString = DoubleQuote & Trim$(OutputArray(LBound(OutputArray))) & DoubleQuote
End If
CleanExit:
Exit Function
ErrorHandler:
ParseString = InputString
Resume CleanExit
End Function
Sub Test()
Dim Arr() As Variant: Arr = Array("Lawrence N. (Larry) Solomon", "James ( Jim ) Alleman", "G.L. (Pete) Ray, CFP")
For i = LBound(Arr) To UBound(Arr)
Debug.Print ParseString(CStr(Arr(i)))
Next
End Sub
Results
"Lawrence N.", "Larry", "Solomon"
"James", "Jim", "Alleman"
"G.L.", "Pete", "Ray, CFP"
Regex: \s*[()]\s*
Details:
\s* matches any whitespace character zero and unlimited times
[()] Match a single character present in the list ( or )
VBA code:
Dim str As String
str = "Lawrence N. (Larry) Solomon"
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.Pattern = "\s*[()]\s*"
re.MultiLine = True
Dim arr As Variant
arr = Strings.Split(re.Replace(str, vbNullChar), vbNullChar)
For Each Match In arr
Debug.Print (Match)
Next
Output:
Lawrence N.
Larry
Solomon

RegExp other patterns not working

I continue trying to perform string format matching using RegExp in VBScript & VB6. I am now trying to match a short, single-line string formatted as:
Seven characters:
a. Six alphanumeric plus one "-" OR
b. Five alphanumeric plus two "-"
Three numbers
Two letters
Literal "65"
A two-digit hex number.
Examples include 123456-789LM65F2, 4EF789-012XY65A5, A2345--789AB65D0 & 23456--890JK65D0.
The RegExp pattern ([A-Z0-9\-]{12})([65][A-F0-9]{2}) lumps (1) - (3) together and finds these OK.
However, if I try to:
c) Break (3) out w/ pattern ([A-Z0-9\-]{10})([A-Z]{2})([65][A-F0-9]{2}),
d) Break out both (2) & (3) w/ pattern ([A-Z0-9\-]{7})([0-9]{3})([A-Z]{2})([65][A-F0-9]{2}), or
e) Tighten up (1) with alternation pattern ([A-Z0-9]{5}[-]{2}|[A-Z0-9]{6}[-]{1})([0-9]{3})([A-Z]{2})([65][A-F0-9]{2})
it refuses to find any of them.
What am I doing wrong? Following is a VBScript that runs and checks these.
' VB Script
Main()
Function Main() ' RegEx_Format_sample.vbs
'Uses two paterns, TestPttn for full format accuracy check & SplitPttn
'to separate the two desired pieces
Dim reSet, EtchTemp, arrSplit, sTemp
Dim sBoule, sSlice, idx, TestPttn, SplitPttn, arrMatch
Dim arrPttn(3), arrItems(3), idxItem, idxPttn, Msgtemp
Set reSet = New RegExp
' reSet.IgnoreCase = True ' Not using
' reSet.Global = True ' Not using
' load test case formats to check & split
arrItems(0) = "0,6 nums + 1 '-',123456-789LM65F2"
arrItems(1) = "1,6 chars + 1 '-',4EF789-012XY65A5"
arrItems(2) = "2,5 chars + 2 '-',A2345--789AB65D0"
arrItems(3) = "3,5 nums + 2 '-',23456--890JK65D0"
SplitPttn = "([A-Z0-9]{5,6})[-]{1,2}([A-Z0-9]{9})" ' split pattern has never failed to work
' load the patterns to try
arrPttn(0) = "([A-Z0-9\-]{12})([65][A-F0-9]{2})"
arrPttn(1) = "([A-Z0-9\-]{10}[A-Z]{2})([65][A-F0-9]{2})"
arrPttn(2) = "([A-Z0-9\-]{7})([0-9]{3})([A-Z]{2})([65][A-F0-9]{2})"
arrPttn(3) = "([A-Z0-9]{5}[-]{2}|[A-Z0-9]{6}[-]{1})([0-9]{3})([A-Z]{2})([65][A-F0-9]{2})"
For idxPttn = 0 To 3 ' select Test pattern
TestPttn = arrPttn(idxPttn)
TestPttn = TestPttn & "[%]" ' append % "ender" char
SplitPttn = SplitPttn & "[%]" ' append % "ender" char
For idxItem = 0 To 3
reSet.Pattern = TestPttn ' set to Test pattern
sTemp = arrItems(idxItem )
arrSplit = Split(sTemp, ",") ' arrSplit is Split array
EtchTemp = arrSplit(2) & "%" ' append % "ender" char to Item sub (2) as the "phrase" under test
If reSet.Test(EtchTemp) = False Then
MsgBox("RegEx " & TestPttn & " false for " & EtchTemp & " as " & arrSplit(1) )
Else ' test OK; now switch to SplitPttn
reSet.Pattern = SplitPttn
Set arrMatch = reSet.Execute(EtchTemp) ' run Pttn as Exec this time
If arrMatch.Count > 0 then ' If test OK then Count s/b > 0
Msgtemp = ""
Msgtemp = "RegEx " & TestPttn & " TRUE for " & EtchTemp & " as " & arrSplit(1)
For idx = 0 To arrMatch.Item(0).Submatches.Count - 1
Msgtemp = Msgtemp & Chr(13) & Chr(10) & "Split segment " & idx & " as " & arrMatch.Item(0).submatches.Item(idx)
Next
MsgBox(Msgtemp)
End If ' Count OK
End If ' test OK
Next ' idxItem
Next ' idxPttn
End Function
Try this Regex:
(?:[A-Z0-9]{6}-|[A-Z0-9]{5}--)[0-9]{3}[A-Z]{2}65[0-9A-F]{2}
Click for Demo
Explanation:
(?:[A-Z0-9]{6}-|[A-Z0-9]{5}--) - matches either 6 Alphanumeric characters followed by a - or 5 Alphanumeric characters followed by a --
[0-9]{3} - matches 3 Digits
[A-Z]{2} - matches 2 Letters
65 - matches 65 literally
[0-9A-F]{2} - matches 2 HEX symbols
You can get some idea from the following code:
VBScript Code:
Option Explicit
Dim objReg, strTest
strTest = "123456-789LM65F2" 'Change the value as per your requirements. You can also store a list of values in an array and run the code in loop
set objReg = new RegExp
objReg.Global = True
objReg.IgnoreCase = True
objReg.Pattern = "(?:[A-Z0-9]{6}-|[A-Z0-9]{5}--)[0-9]{3}[A-Z]{2}65[0-9A-F]{2}"
if objReg.test(strTest) then
msgbox strTest&" matches with the Pattern"
else
msgbox strTest&" does not match with the Pattern"
end if
set objReg = Nothing
Your patterns do not work because:
([A-Z0-9\-]{12})([65][A-F0-9]{2}) - matches 12 occurrences of either an AlphaNumeric character or - followed by either 6 or 5 followed by 2 HEX characters
([A-Z0-9\-]{10}[A-Z]{2})([65][A-F0-9]{2}) - matches 10 occurrences of either an AlphaNumeric character or - followed by 2 Letters followed by either 6 or 5 followed by 2 HEX characters
([A-Z0-9\-]{7})([0-9]{3})([A-Z]{2})([65][A-F0-9]{2}) - matches 7 occurrences of either an AlphaNumeric character or - followed by 3 digits followed by 2 Letters followed by either 6 or 5 followed by 2 HEX characters
([A-Z0-9]{5}[-]{2}|[A-Z0-9]{6}[-]{1})([0-9]{3})([A-Z]{2})([65][A-F0-9]{2}) - matches either 5 occurrences of an AlphaNumeric character followed by -- or 6 occurrences of an Alphanumeric followed by a -. This is then followed by 3 digits followed by 2 Letters followed by either 6 or 5 followed by 2 HEX characters
Try this pattern :
(([A-Z0-9]{5}--)|([A-Z0-9]{6}-))[0-9]{3}[A-Z]{2}65[0-9A-F]{2}
Or, if the last part doesn't like the [A-F]
(([A-Z0-9]{5}--)|([A-Z0-9]{6}-))[0-9]{3}[A-Z]{2}65[0-9ABCDEF]{2}
All, tanx again for your help!!
trincot, everything in each arrItems() between the commas, incl the the "plus", is merely part of a shorthand description of each item's characteristics, such as "5 characters plus 2 dashes".
Gurman, your pttn breakdowns were helpful, but, if I read it right, the addition of the ? prefix is a "Match zero or one occurrences" and this must match exactly one occurrence. Also, my 1st pattern (matches 12) actually DID work for all my test cases.
jNevill, & JMichelB your suggestions are very close to what I ended up with.
I was "over-classing". After some tinkering, I was able to get the Test Pttn to successfully recognize these test cases by taking the [65] out of the [] in my original Alternation pattern. That is I went from ([65]) to (65) and Zammo! it worked.
Orig pattern:
([A-Z0-9]{5}[-]{2}|[A-Z0-9]{6}[-]{1})([0-9]{3})([A-Z]{2})([65][A-F0-9]{2})
Wkg pattern:
([A-Z0-9]{5}[-]{2}|[A-Z0-9]{6}[-]{1})([0-9]{3})([A-Z]{2})(65)([A-F0-9]{2})
Oh, and I moved the
SplitPttn = SplitPttn & "[%]" ' append % "ender" char
stmt up out of the For...Next loop. That helped w/ the splitting.
T-Bone

Regex to match all " - " deliminators in filename except first and last?

I've been trying to write a regex to match all the " - " deliminators in a filename except the first and last, so I can combine all the data in the middle into one group, for example a filename like:
Ann M Martin - Baby sitters Club - Baby sitters Little Sister - Super Special 04 - Karen, Hannie and Nancy - The Three Musketeers.doc
Has to become:
Ann M Martin - Baby sitters Club- Baby sitters Little Sister- Super Special 04- Karen, Hannie and Nancy - The Three Musketeers.doc
So basically I'm trying to replace " - " with "- " but not the first or last instance. The Filenames can have 1 to 6 " - " deliminators, but should only affect the ones with 3, 4, 5 or 6 " - " deliminators.
It's for use in File Renamer. flavor is JavaScript. Thanks.
Can you not use a regex? If so:
var s = "Ann M Martin - Baby sitters Club - Baby sitters Little Sister - Super Special 04 - Karen, Hannie and Nancy - The Three Musketeers.doc";
var p = s.split(' - ');
var r = ''; // result output
var i = 0;
p.forEach(function(e){
switch(i) {
case 0: r += e; break;
case 1: case p.length - 1: r += ' - ' + e; break;
default: r += '- ' + e;
}
i++;
});
console.log(r);
http://jsfiddle.net/c7zcp8z6/1/
s=Ann M Martin - Baby sitters Club - Baby sitters Little Sister - Super Special 04 - Karen, Hannie and Nancy - The Three Musketeers.doc
r=Ann M Martin - Baby sitters Club- Baby sitters Little Sister- Super Special 04- Karen, Hannie and Nancy - The Three Musketeers.doc
This is assuming that the separator is always - (1 space, 1 dash, 1 space). If not, you need to split on - only, then trim each tokens before reconstructing.
Two options:
1 - You'll need to do some processing of your own by iterating through the matches using
( - )
and building a new string (see this post about getting match indices).
You'll have to check that the match count is greater than 2 and skip the first and last matches.
2 - Use
.+ - ((?:.+ - )+).+ - .+
to get the part of the string to be modified and then do a replace on the the dashes, then build your string (again using the indices from the above regex).
Thanks for the suggestions.
I got it to work this way
It replaces the first and last " - " with " ! ", so I can then do a simple Find and Replace of all remaining " - " with "- ", then change all the " ! " back to " - "