Developing Regular Expression to my needs

Developing Regular Expression to my needs - regex

I'm really bad with regular expressions and find them to be too complex. However, I need to use them to do some string manipulation in classic asp.
Input String :
"James John Junior
S.D. Industrial Corpn
D-2341, Focal Point, Phase 4-a,
Sarsona, Penns
Japan
Phone : 92-161-4633248 Fax : 92-161-253214
email : swerte_60#laher.com"
Desired Output string:
"JXXXX JXXX JXXXXX
S.X. IXXXXXXXXX CXXXX
D-XXXX, FXXXX PXXXX, PXXXX 4-X,
SXXXXXX, PXXXX
JXXXX
PXXXX : 9X-XXX-XXXXXXX Fax : 9X-XXX-XXXXXX
eXXXX : sXXXXX_XX#XXXXX.XXX"
Note: We need to split the original string into words based on a single space Then, in those words, we need to replace all letters (lower and upper case) and numbers except for the first character in each word with an "X"
I know its sort of difficult, but a seasoned RegEx expert could nail this pretty easily I would think. No?
Edit:
I've made some progress. Found a function (http://www.addedbytes.com/lab/vbscript-regular-expressions/) that sort of does the job. But needs a little refinement, if anyone can help
function ereg_replace(strOriginalString, strPattern, strReplacement, varIgnoreCase)
' Function replaces pattern with replacement
' varIgnoreCase must be TRUE (match is case insensitive) or FALSE (match is case sensitive)
dim objRegExp : set objRegExp = new RegExp
with objRegExp
.Pattern = strPattern
.IgnoreCase = varIgnoreCase
.Global = True
end with
ereg_replace = objRegExp.replace(strOriginalString, strReplacement)
set objRegExp = nothing
end function
Im calling it like so -
orgstr = ereg_replace(orgstr, "\w", "X", True)
However, the result looks like -
XXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXX.
XX, XXXXX XXXX, XXXXXX XXXXXX, XXXXXXX XXXXXXX, XXXXXXXXX
XXXXX : XXX-XXX-XXXX
XXX :
XXXXX : XXXXXX#XXXXXX.XX
I'd like this to show the first character in every word. Any help out there?

This approach gets close:
Function AnonymiseWord(m, p, s)
AnonymiseWord = Left(m, 1) & String(Len(m) - 1, "X")
End Function
Function AnonymiseText(input)
Dim rgx: Set rgx = new RegExp
rgx.Global = True
rgx.Pattern = "\b\w+?\b"
AnonymiseText = rgx.Replace(input, GetRef("AnonymiseWord"))
End Function
This might get you close enough to what you need otherwise the basic approach is sound but you may need to fiddle with that pattern to get it match exactly the stretches of text you want to put through AnonymiseWord.

Well, in .NET it would be easy:
resultString = Regex.Replace(subjectString,
#"(?<= # Assert that there is before the current position...
\b # a word boundary
\w # one alphanumeric character (= first letter/digit/underscore)
[\w.#-]* # any number of alnum characters or ., # or -
) # End of lookbehind
[\p{L}\p{N}] # Match any letter or digit to be replaced",
"X", RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);
The result, though, would be slightly different than what you wrote:
"JXXXX JXXX JXXXXX
S.X. IXXXXXXXXX CXXXX
D-XXXX, FXXXX PXXXX, PXXXX 4-X,
SXXXXXX, PXXXX
JXXXX
PXXXX : 9X-XXX-XXXXXXX FXX : 9X-XXX-XXXXXX
eXXXX : sXXXXX_XX#XXXXX.XXX"
(observe that Fax has also been changed to FXX)
Without .NET, you could try something like
orgstr = ereg_replace("\b(\w)[\w.#-]*", "\1XXXX", True); // not sure about the syntax here, you possibly need double backslashes
which would give you
"JXXXX JXXXX JXXXX
SXXXX IXXXX CXXXX
DXXXX, FXXXX PXXXX, PXXXX 4XXXX,
SXXXX, PXXXX
JXXXX
PXXXX : 9XXXX FXXXX : 9XXXX
eXXXX : sXXXX"
You won't get it better than that with a single regex.

I have no idea about classic ASP, but if it does support (negative) lookbehinds and the only problem is the quantifier in the lookbehind, then why not turn it around and do it this way:
(?<!^)(?<!\s)[a-zA-Z0-9]
and replace with "X".
Means, replace every letter and number if there is not a whitespace or not the start of the string/row before.
See it here on Regexr

Although I love regular expressions, you could do it without them, especially because VBScript does not support look behind.
Dim mystring, myArray, newString, i, j
Const forbiddenChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
myString = "James John Junior S.D. Industrial Corpn D-2341, Focal Point, Phase 4-a, Sarsona, Penns Japan Phone : 92-161-4633248 Fax : 92-161-253214 email : swerte_60#laher.com"
myArray = split(myString, " ")
For i = lbound(myArray) to ubound(myArray)
newString = left(myArray(i), 1)
For j = 2 to len(myArray(i))
If instr(forbiddenChars, mid(myArray(i), j, 1)) > 0 Then
newString = newString & "X"
else
newString = newString & mid(myArray(i), j, 1)
End If
Next
myArray(i) = newString
Next
myString = join(myArray, " ")
It doesn't cope with the VbNewLine character, but you will get the idea. You can do an extra split on the VbNewLine character, iterate through all elements and split each element on the space for example.

Related

Manipulate string to extract address

I'm currently doing some work with a very large data source on city addresses where the data looks something like this.
137 is the correct address but it belongs in a building that takes up 135-138A on the street.
source:
137 9/F 135-138A KING STREET 135-138A KING STREET TOR
i've used a function which removes the duplicates shown on extendoffice.
the second column has become this:
137 9/F 135-138A KING STREET TOR
what I want to do now is
find address number and add it in front of the street name
remove the numbers that are connected to the dash - ):
9/F 137 KING STREET TOR
Would the the best way to accomplish this?
The main problem I'm having with this is there are many inconsistent spaces in address names ex. "van dyke rd".
Is there anyway I can locate in an array the "-" and set variables for the 2 numbers on either side of the dash and replace it with the correct address number located at the front
Function RemoveDupes2(txt As String, Optional delim As String = " ") As String
Dim x
With CreateObject("Scripting.Dictionary")
.CompareMode = vbTextCompare
For Each x In Split(txt, delim)
If Trim(x) <> "" And Not .exists(Trim(x)) Then .Add Trim(x), Nothing
Next
If .Count > 0 Then RemoveDupes2 = Join(.keys, delim)
End With
End Function
Thanks

Regular Expressions are a way to (amongst other things) search for a feature in a string.
It looks like the feature you are looking for is: number:maybe some spaces : dash : maybe some spaces : number
In regex notation this would be expressed as:
([0-9]*)[ ]*-[ ]*([0-9]*)
Which translates to: Find a sequential group of digits followed by zero or more spaces, then a dash, then zero or more spaces, then some more digits.
The parenthesis indicate the elements that will be returned. So you could assign variables to the be the first number or the second number.
You might need to tweak this if a dash can potentially occur elsewhere in the address.
Further information on actually implementing that is available here: How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops

This meets the case you want, it captures the address range as two separate matches (if you want to process further).
The current code simple removes this range altogether.
What logic is there to move the 9/F to front?
See regex here
Function StripString(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Pattern = "(\d+[A-C]?)-(\d+[A-C]?)"
If .test(strIn) Then
StripString = .Replace(strIn, vbullstring)
Else
StripString = "No match"
End If
End With
End Function

I'd just:
swap 1st and 2nd substrings
erase the substring with "-" in it
Function RemoveDupes2(txt As String, Optional delim As String = " ") As String
Dim x As Variant, arr As Variant, temp As Variant
Dim iArr As Long
With CreateObject("Scripting.Dictionary")
.CompareMode = vbTextCompare
For Each x In Split(txt, delim)
If Trim(x) <> "" And Not .exists(Trim(x)) Then .Add Trim(x), Nothing
Next
If .count > 0 Then
arr = .keys
temp = arr(0)
arr(0) = arr(1)
arr(1) = temp
For iArr = LBound(arr) To UBound(arr)
If InStr(arr(iArr), "-") <> 0 Then arr(iArr) = ""
Next
RemoveDupes2 = Join(arr, delim)
End If
End With
End Function

Get the third Regex between special char

I have this text:
2|#Favo|Name||26.0000|50.10000|_GRE|||||City|Road||||
I want to capture anything between those special chars: ||
For example, I want to capture "Name" only or I want to capture "City"
I've spent many hours and all I came up with is this regex:
([^|].*[$|])\w+
Here are the required values:
How can I capture one of them?
Thank you.

You may split the string with | removing empty entries and also all those that are blank or consisting only of digits:
Dim strng As String = "2|#Favo|Name||26.0000|50.10000|_GRE|||||City|Road||||"
Dim reslt As List(Of String) = strng.Split(New String() {"|"}, StringSplitOptions.RemoveEmptyEntries).Where(
Function(m) m.All(AddressOf Char.IsDigit) = False And String.Equals(m.Trim(), String.Empty) = False).ToList()
Console.Write(String.Join(", ", reslt))
Output:

VBA and RegEx matching arbitrary strings in Excel 2010

I need to extract adress and potentially zip code as separate entites from the same line. The address line may or may not contain a zip code, and may or may not contain other unwanted strings. This is due to a bug in a web form, which is fixed, but the damage is already done to a set of elements.
Possible forms and results:
Address: Some address 251, 99302 Something Telephone: 555 6798 8473 -- Return "some address 251" and "99302 something" in separate strings. Comma may or may not be trailed by whitespace.
Address: Some address 251 -- Return "some address 251"
Address: Some address 251, 99302 -- Return "some address 251" and "99302". Again, comma may or may not be trailed by whitespace.
I have a basic understanding of how this could be done programatically in VBA by iterating over the string and checking individual characters and substrings, but I feel like it will be time-consuming and not very robust afterwards. Or if it's robust, it would end up being huge because of all the possible variations.
I am struggling the most with how to form the regular expression(s) and possibly the conditionals to get the desired results.
This is part of a larger project, so I won't paste all the various code, but I am pulling mailitems from Outlook to analyze and dump relevant info into an Excel sheet. I have both the Outlook and Excel code working, but the logic that extracts information is a bit flawed.
Here are the new snippets I've been working on:
Function regexp(str As String, regP As String)
Dim rExp As Object, rMatch As Object
Set rExp = CreateObject("vbscript.regexp")
With rExp
.Global = False
.MultiLine = False
.IgnoreCase = True
.Pattern = regP
End With
Set rMatch = rExp.Execute(str)
If rMatch.Count > 0 Then
regexp = rMatch(0)
Else
RegEx = vbNullString
Debug.Print "No match found!"
End If
End Function
Sub regexpAddress(str As String)
Dim result As String
Dim pattern As String
If InStr(str, "Telephone:") Then pattern = "/.+?(?=Telephone:)/"
result = regexp(str, pattern)
End Sub
I'm not sure how to form the regexps here. The one outlined should pull the right information (in 1 string instead 2, but that's still an improvement) - but only when the line contains the string "Telephone:", and I have a lot of cases where it won't contain that.
This is the current and somewhat flawed logic, which for some reason doesn't always yield the results I want:
For Each objMail In olFolder.Items
name = ""
address = ""
telephone = ""
email = ""
vIterations = vIterations + 1
arrBody = Split(objMail.body, Chr(10)) ' Split mail body when linebreak is encountered, throwing each line into its own array position
For i = 0 To UBound(arrBody)
arrLine = Split(arrBody(i), ": ") ' For each element (line), make new array, and if text search matches then write the 2nd half of the element to variable
If InStr(arrBody(i), "Name:") > 0 Then ' L2
name = arrLine(1) ' Reference 2nd column in array after the split
ElseIf InStr(arrBody(i), "Address:") > 0 Then
address = arrLine(1)
ElseIf InStr(arrBody(i), "Telephone:") > 0 Then
telephone = CLng(arrLine(1))
ElseIf InStr(arrBody(i), "Email:") > 0 Then
email = arrLine(1)
End If ' L2
Next
Next ' Next/end-for
This logic accepts and formats input of the following type:
Name: Joe
Address: Road
Telephone: 55555555555555
Email: joe#road.com
and returns joe, road, 55555 and joe#road.com to some defined Excel cells. This works fine when the mailitems are ordered as expected.
Problem: A bug lead to not my webform not inserting a linebreak after the address in some cases. The script still worked for the most part, but the mailitem contents sometimes ended up looking like this:
Name: Joe
Address: Road Telephone: 55555555555555
Email: joe#road.com
The address field was contaminated when it reached Excel ("Road Telephone" instead of just "Road"), but there was no loss of information. Which was acceptable, as it's easy to remove the surpluss string.
But in the following case (no email is entered), the phone number is not only lost but is actually replaced by a phone number from some other, arbitrary mailitem and I can't FOR THE LIFE OF ME figure out (1) why it won't get the correct number, (2) why it jumps to a new mail item to find the phone number or (3) how it selects this other mailitem:
Name: Joe
Address: Road Telephone: 5555555555555
Email:
In Excel:
Name: Joe
Address: Road Telephone
Telephone: 8877445511
Email:
So, TL;DR: my selection logic is flawed, and being that it is so hastily hacked together, not to mention how it yields false information and I am unable to figure out how and why, I would like to do a better operation using some other solution (like regexp?) instead for a more robust code.

Not so long ago I had a similar problem.
Code may not be very professional, but it can be helpful :)
Could you check if this code work for you correctly?
Function regexp(str As String, regP As String)
Dim rExp As Object, rMatch As Object
Set rExp = CreateObject("vbscript.regexp")
With rExp
.Global = False
.MultiLine = False
.IgnoreCase = True
.pattern = regP
End With
Set rMatch = rExp.Execute(str)
If rMatch.Count > 0 Then
regexp = rMatch(0)
Else
RegEx = vbNullString
Debug.Print "No match found!"
End If
End Function
Function for_vsoraas()
For Each objMail In olFolder.Items
vIterations = vIterations + 1
objMail_ = Replace(objMail.body, Chr(10), " ")
Dim StringToSearch(3) As String
StringToSearch(0) = "Name:"
StringToSearch(1) = "Address:"
StringToSearch(2) = "Telephone:"
StringToSearch(3) = "Email:"
Dim ArrResults(4) As String 'name,address,telephone,email, zipcode
For i = 0 To UBound(StringToSearch)
ResultString = ""
StartString = InStr(objMail_, StringToSearch(i))
If StartString > 0 Then
If i = UBound(StringToSearch) Then 'last string to search, dont search EndString
ResultString = Right(objMail_, Len(objMail_) + Len(StringToSearch(i)))
Else
EndString = 0
j = i
While (EndString = 0) 'prevent case no existing EndString
EndString = InStr(objMail_, StringToSearch(j + 1))
j = j + 1
If j = UBound(StringToSearch) And EndString = 0 Then
EndString = Len(objMail_) + 1
End If
Wend
ResultString = Mid(objMail_, StartString + Len(StringToSearch(i)) + 1, EndString - 1 - StartString - Len(StringToSearch(i)))
End If
ArrResults(i) = ResultString
End If
Next i
'search zipcode and address
ArrResults(4) = regexp(ArrResults(1), "\b(\d{5})\b")
ArrResults(1) = regexp(ArrResults(1), "([a-z ]{2,}\s{0,1}\d{0,3})")
'your varabile
Name = ArrResults(0)
Address = ArrResults(1)
Telephone = ArrResults(2)
Email = ArrResults(3)
ZipCode = ArrResults(4)
Next ' Next/end-for
End Function

I don't know if it was dumb luck or if I actually managed to learn some regex, but these patterns turn out to do exactly what I need.
' regex patterns - use flag /i
adrPattern = "([a-z ]{2,}\s{0,1}\d{0,3})" ' Select from a-z or space, case insensitive and at least 2 characters long, followed by optional space, ending with 0-3 digits
adrZipcode = "\b(\d{4})\b" ' Exactly 4 digits surrounded on both sides by either space, text or non-word character like comma
Edit: "Fixed" the telephone problem too. After spending 2 hours trying to write it in regex, and failing miserably, it dawned on me that solving the problem as a matter of faulty creation of the array had to be so much easier than treating it as a computational problem. And it was:
mailHolder = Replace(objMail.body, "Telephone:", Chr(10) + "Telephone:")
arrBody = Split(mailHolder, Chr(10))

Regex - Quantifier {x,y} following nothing

I'm creating a basic text editor and I'm using regex to achieve a find and replace function. To do this I've gotten this code:
Private Function GetRegExpression() As Regex
Dim result As Regex
Dim regExString As [String]
' Get what the user entered
If TabControl1.SelectedIndex = 0 Then
regExString = txtbx_Find2.Text
ElseIf TabControl1.SelectedIndex = 1 Then
regExString = txtbx_Find.Text
End If
If chkMatchCase.Checked Then
result = New Regex(regExString)
Else
result = New Regex(regExString, RegexOptions.IgnoreCase)
End If
Return result
End Function
And this is the Find method
Private Sub FindText()
''
Dim WpfTest1 As New Spellpad.Tb
Dim ElementHost1 As System.Windows.Forms.Integration.ElementHost = frm_Menu.Controls("ElementHost1")
Dim TheTextBox As System.Windows.Controls.TextBox = CType(ElementHost1.Child, Tb).ctrl_TextBox
''
' Is this the first time find is called?
' Then make instances of RegEx and Match
If isFirstFind Then
regex = GetRegExpression()
match = regex.Match(TheTextBox.Text)
isFirstFind = False
Else
' match.NextMatch() is also ok, except in Replace
' In replace as text is changing, it is necessary to
' find again
'match = match.NextMatch();
match = regex.Match(TheTextBox.Text, match.Index + 1)
End If
' found a match?
If match.Success Then
' then select it
Dim row As Integer = TheTextBox.GetLineIndexFromCharacterIndex(TheTextBox.CaretIndex)
MoveCaretToLine(TheTextBox, row + 1)
TheTextBox.SelectionStart = match.Index
TheTextBox.SelectionLength = match.Length
Else
If TabControl1.SelectedIndex = 0 Then
MessageBox.Show([String].Format("Cannot find ""{0}"" ", txtbx_Find2.Text), Application.ProductName, MessageBoxButtons.OK, MessageBoxIcon.Information)
ElseIf TabControl1.SelectedIndex = 1 Then
MessageBox.Show([String].Format("Cannot find ""{0}"" ", txtbx_Find.Text), Application.ProductName, MessageBoxButtons.OK, MessageBoxIcon.Information)
End If
isFirstFind = True
End If
End Sub
When I run the program I get errors:
For ?, parsing "?" - Quantifier {x,y} following nothing.; and
For *, parsing "*" - Quantifier {x,y} following nothing.
It's as if I can't use these but I really need to. How can I solve this problem?

? and * are quantifiers in regular expressions:
? is used to specify that something is optional, for instance b?au can match both bau and au.
* means the group with which it binds can be repeated zero, one or multiple times: for instance ba*u can bath bu, bau, baau, baaaaaaaau,...
Now most regular expressions use {l,u} as a third pattern with l the lower bound on the number of times something is repeated, and u the upper bound on the number of occurences. So ? is replaced by {0,1} and * by {0,}.
Now if you provide them without any character before them, evidently, the regex parser doesn't know what you mean. In other words if you do (used csharp, but the ideas are generally applicable):
$ csharp
Mono C# Shell, type "help;" for help
Enter statements below.
csharp> Regex r = new Regex("fo*bar");
csharp> r.Replace("Fooobar fooobar fbar fobar","<MATCH>");
"Fooobar <MATCH> <MATCH> <MATCH>"
csharp> r.Replace("fooobar far qux fooobar quux fbar echo fobar","<MATCH>");
"<MATCH> far qux <MATCH> quux <MATCH> echo <MATCH>"
If you wish to do a "raw text find and replace", you should use string.Replace.
EDIT:
Another way to process them is by escaping special regex characters. Ironically enough, you can do this by replacing them by a regex ;).
Private Function GetRegExpression() As Regex
Dim result As Regex
Dim regExString As [String]
' Get what the user entered
If TabControl1.SelectedIndex = 0 Then
regExString = txtbx_Find2.Text
ElseIf TabControl1.SelectedIndex = 1 Then
regExString = txtbx_Find.Text
End If
'Added code
Dim baseRegex As Regex = new Regex("[\\.$^{\[(|)*+?]")
regExString = baseRegex.Replace(regExString,"\$0")
'End added code
If chkMatchCase.Checked Then
result = New Regex(regExString)
Else
result = New Regex(regExString, RegexOptions.IgnoreCase)
End If
Return result
End Function

How to extract substring in parentheses using Regex pattern

This is probably a simple problem, but unfortunately I wasn't able to get the results I wanted...
Say, I have the following line:
"Wouldn't It Be Nice" (B. Wilson/Asher/Love)
I would have to look for this pattern:
" (<any string>)
In order to retrieve:
B. Wilson/Asher/Love
I tried something like "" (([^))]*)) but it doesn't seem to work. Also, I'd like to use Match.Submatches(0) so that might complicate things a bit because it relies on brackets...

Edit: After examining your document, the problem is that there are non-breaking spaces before the parentheses, not regular spaces. So this regex should work: ""[ \xA0]*\(([^)]+)\)
"" 'quote (twice to escape)
[ \xA0]* 'zero or more non-breaking (\xA0) or a regular spaces
\( 'left parenthesis
( 'open capturing group
[^)]+ 'anything not a right parenthesis
) 'close capturing group
\) 'right parenthesis
In a function:
Public Function GetStringInParens(search_str As String)
Dim regEx As New VBScript_RegExp_55.RegExp
Dim matches
GetStringInParens = ""
regEx.Pattern = """[ \xA0]*\(([^)]+)\)"
regEx.Global = True
If regEx.test(search_str) Then
Set matches = regEx.Execute(search_str)
GetStringInParens = matches(0).SubMatches(0)
End If
End Function

Not strictly an answer to your question, but sometimes, for things this simple, good ol' string functions are less confusing and more concise than Regex.
Function BetweenParentheses(s As String) As String
BetweenParentheses = Mid(s, InStr(s, "(") + 1, _
InStr(s, ")") - InStr(s, "(") - 1)
End Function
Usage:
Debug.Print BetweenParentheses("""Wouldn't It Be Nice"" (B. Wilson/Asher/Love)")
'B. Wilson/Asher/Love
EDIT #alan points our that this will falsely match the contents of parentheses in the song title. This is easily circumvented with a little modification:
Function BetweenParentheses(s As String) As String
Dim iEndQuote As Long
Dim iLeftParenthesis As Long
Dim iRightParenthesis As Long
iEndQuote = InStrRev(s, """")
iLeftParenthesis = InStr(iEndQuote, s, "(")
iRightParenthesis = InStr(iEndQuote, s, ")")
If iLeftParenthesis <> 0 And iRightParenthesis <> 0 Then
BetweenParentheses = Mid(s, iLeftParenthesis + 1, _
iRightParenthesis - iLeftParenthesis - 1)
End If
End Function
Usage:
Debug.Print BetweenParentheses("""Wouldn't It Be Nice"" (B. Wilson/Asher/Love)")
'B. Wilson/Asher/Love
Debug.Print BetweenParentheses("""Don't talk (yell)""")
' returns empty string
Of course this is less concise than before!

This a nice regex
".*\(([^)]*)
In VBA/VBScript:
Dim myRegExp, ResultString, myMatches, myMatch As Match
Dim myRegExp As RegExp
Set myRegExp = New RegExp
myRegExp.Pattern = """.*\(([^)]*)"
Set myMatches = myRegExp.Execute(SubjectString)
If myMatches.Count >= 1 Then
Set myMatch = myMatches(0)
If myMatch.SubMatches.Count >= 3 Then
ResultString = myMatch.SubMatches(3-1)
Else
ResultString = ""
End If
Else
ResultString = ""
End If
This matches
Put Your Head on My Shoulder
in
"Don't Talk (Put Your Head on My Shoulder)"
Update 1
I let the regex loose on your doc file and it matches as requested. Quite sure the regex is fine. I'm not fluent in VBA/VBScript but my guess is that's where it goes wrong
If you want to discuss the regex some further that's fine with me. I'm not eager to start digging into this VBscript API which looks arcane.
Given the new input the regex is tweaked to
".*".*\(([^)]*)
So that it doesn't falsely match (Put Your Head on My Shoulder) which appears inside the quotes.

This function worked on your example string:
Function GetArtist(songMeta As String) As String
Dim artist As String
' split string by ")" and take last portion
artist = Split(songMeta, "(")(UBound(Split(songMeta, "(")))
' remove closing parenthesis
artist = Replace(artist, ")", "")
End Function
Ex:
Sub Test()
Dim songMeta As String
songMeta = """Wouldn't It Be Nice"" (B. Wilson/Asher/Love)"
Debug.Print GetArtist(songMeta)
End Sub
prints "B. Wilson/Asher/Love" to the Immediate Window.
It also solves the problem alan mentioned. Ex:
Sub Test()
Dim songMeta As String
songMeta = """Wouldn't (It Be) Nice"" (B. Wilson/Asher/Love)"
Debug.Print GetArtist(songMeta)
End Sub
also prints "B. Wilson/Asher/Love" to the Immediate Window. Unless of course, the artist names also include parentheses.

This another Regex tested with a vbscript (?:\()(.*)(?:\)) Demo Here
Data = """Wouldn't It Be Nice"" (B. Wilson/Asher/Love)"
wscript.echo Extract(Data)
'---------------------------------------------------------------
Function Extract(Data)
Dim strPattern,oRegExp,Matches
strPattern = "(?:\()(.*)(?:\))"
Set oRegExp = New RegExp
oRegExp.IgnoreCase = True
oRegExp.Pattern = strPattern
set Matches = oRegExp.Execute(Data)
If Matches.Count > 0 Then Extract = Matches(0).SubMatches(0)
End Function
'---------------------------------------------------------------

I think you need a better data file ;) You might want to consider pre-processing the file to a temp file for modification, so that outliers that don't fit your pattern are modified to where they'll meet your pattern. It's a bit time consuming to do, but it is always difficult when a data file lacks consistency.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Developing Regular Expression to my needs - regex

Related

Manipulate string to extract address

Get the third Regex between special char

VBA and RegEx matching arbitrary strings in Excel 2010

Regex - Quantifier {x,y} following nothing

How to extract substring in parentheses using Regex pattern

Categories

Resources