Regex to find words and wrap quotes - regex

I am trying to find words with spaces that are surrounded by (, ) or , and wrap them in quotes..
For e.g. In this expression - Development life cycle and enterprise service bus are to be wrapped in quotes.
Edit - Only phrases i.e. Words that contain spaces between them are to be wrapped
(AND(OR(SDLC,development life cycle),design,requirements,OR(biztalk,Websphere,TIBCO,Webmethods,ESB,enterprise service bus)))

(?<=[(,])([^(),]* [^(),]*)(?=[),])
Try this. See DEMO.
Replace by "$1" or "\1"
string strRegex = #"(?<=[(,])([^(),]* [^(),]*)(?=[),])";
Regex myRegex = new Regex(strRegex, RegexOptions.Multiline);
string strTargetString = #"(AND(OR(SDLC,development life cycle),design,requirements,OR(biztalk,Websphere,TIBCO,Webmethods,ESB,enterprise service bus)))" + "\n" + #" AND(OR(SDLC,""development life cycle""),OR(banking,AML,anti-money laundering,KYC,know your customer),OR(technology strategy,technical strategy,technical architecture,technology architecture,architect*)";
string strReplace = #"""$1""";
return myRegex.Replace(strTargetString, strReplace);

Related

VBScript RegEx - match between words

I'm having a hard time coming up with a working RegEx that words in VBScript. I'm trying to match all text between 2 keywords:
(?<=key)(.*)(?=Id)
This throws a RegEx error in VBScript. Id
Blob I'm matching against:
\"key\":[\"food\",\"real\",\"versus\",\"giant\",\"giant gummy\",\"diy candy\",\"candy\",\"gummy worm\",\"pizza\",\"fries\",\"spooky diy science\",\"spooky\",\"trapped\"],\"Id\"
Ideally, I'd end up with a comma delimited list like this:
food,real,versus,giant,giant gummy,diy candy,candy,gummy worm,pizza,fries,spooky diy science,spooky,trapped
but, I'd settle for all text between 2 keywords working in VBScript.
Thanks in advance!
VBScript's regular expression engine doesn't support lookbehind assertions, so you'll want to do something like this instead:
s = "\""key\"":[\""food\"",\""real\"",\""trapped\""],\""Id\"""
'remove backslashes and double quotes from string
s1 = Replace(s, "\", "")
s1 = Replace(s1, Chr(34), "")
Set re = New RegExp
re.Pattern = "key:\[(.*?)\],Id"
For Each m In re.Execute(s1)
list = m.Submatches(0)
Next
WScript.Echo list

How to mimic regular Expression negative lookbehind?

What I'm trying to accomplish
I'm trying to create a function to use string interpolation within VBA. The issue I'm having is that I'm not sure how to replace "\n" with a vbNewLine, as long as it does not have the escape character "\" before it?
What I have found and tried
VBScript does not have a negative look behind as far as I can research.
Below has two examples of Patterns that I have already tried:
Private Sub testingInjectFunction()
Dim dict As New Scripting.Dictionary
dict("test") = "Line"
Debug.Print Inject("${test}1\n${test}2 & link: C:\\notes.txt", dict)
End Sub
Public Function Inject(ByVal source As String, dict As Scripting.Dictionary) As String
Inject = source
Dim regEx As Object
Set regEx = CreateObject("VBScript.RegExp")
regEx.Global = True
' PATTERN # 1 REPLACES ALL '\n'
'regEx.Pattern = "\\n"
' PATTERN # 2 REPLACES EXTRA CHARACTER AS LONG AS IT IS NOT '\'
regEx.Pattern = "[^\\]\\n"
' REGEX REPLACE
Inject = regEx.Replace(Inject, vbNewLine)
' REPLACE ALL '${dICT.KEYS(index)}' WITH 'dICT.ITEMS(index)' VALUES
Dim index As Integer
For index = 0 To dict.Count - 1
Inject = Replace(Inject, "${" & dict.Keys(index) & "}", dict.Items(index))
Next index
End Function
Desired result
Line1
Line2 & link: C:\notes.txt
Result for Pattern # 1: (Replaces when not wanted)
Line1
Line2 & link: C:\
otes.txt
Result for Pattern # 2: (Replaces the 1 in 'Line1')
Line
Line2 & link: C:\\notes.txt
Summary question
I can easily write code that doesn't use Regular Expressions that can achieve my desired goal but want to see if there is a way with Regular Expressions in VBA.
How can I use Regular Expressions in VBA to Replace "\n" with a vbNewLine, as long as it does not have the escape character "\" before it?
Yes, you may use a regex here. Since the backslash is not used to escape itself in these strings, you may modify your solution like this:
regEx.Pattern = "(^|[^\\])\\n"
S = regEx.Replace(S, "$1" & vbNewLine)
It will match and capture any char but \ before \n and then will put it back with the $1 placeholder. As there is a chance that \n appears at the start of the string, ^ - the start of string anchor - is added as an alternative into the capturing group.
Pattern details
(^|[^\\]) - Capturing group 1: start of string (^) or (|) any char but a backslash ([^\\])
\\ - a backslash
n - a n char.

regex .NET to find and replace underscores only if found between > and <

I have a list of strings looking like this:
Title_in_Title_by_-_Mr._John_Doe
and I need to replace the _ with a SPACE from the text between the html"> and </a> ONLY.
so that the result to look like this:
Title in Title by - Mr. John Doe
I've tried to do it in 2 steps:
first isolate that part only with .*html">(.*)<\/a.* & ^.*>(.*)<.* & .*>.*<.* or ^.*>.*<.*
and then do the replace but the return is always unchanged and now I'm stuck.
Any help to accomplish this is much appreciated
How I would do it is to .split it and then .replace it, no need for regex.
Dim line as string = "Title_in_Title_by_-_Mr._John_Doe"
Dim split as string() = line.split(">"c)
Dim correctString as String = split(1).replace("_"c," "c)
Boom done
here is the string.replace article
Though if you had to use regex, this would probably be a better way of doing it
Dim inputString = "Title_in_Title_by_-_Mr._John_Doe"
Dim reg As New Regex("(?<=\>).*?(?=\<)")
Dim correctString = reg.match(inputString).value.replace("_"c, " "c)
Dim line as string = "Title_and_Title_by_-_Mr._John_Doe"
line = Regex.Replace(line, "(?<=\.html"">)[^<>]+(?=</a>)", _
Function (m) m.Value.Replace("_", " "))
This uses a regex with lookarounds to isolate the title, and a MatchEvaluator delegate in the form of a lambda expression to replace the underscores in the title, then it plugs the result back into the string.

VBA Regular Expressions - Run-Time Error 91 when trying to replace characters in string

I am doing this task as part of a larger sub in order to massively reduce the workload for a different team.
I am trying to read in a string and use Regular Expressions to replace one-to-many spaces with a single space (or another character). At the moment I am using a local string, however in the main sub this data will come from an external .txt file. The number of spaces between elements in this .txt can vary depeneding on the row.
I am using the below code, and replacing the spaces with a dash. I have tried different variations and different logic on the below code, but always get "Run-time error '91': Object Variable or with clock variable not set" on line "c = re.Replace(s, replacement)"
After using breakpoints, I have found out that my RegularExpression (re) is empty, but I can't quite figure out how to progress from here. How do I replace my spaces with dashes? I have been at this problem for hours and spent most of that time on Google to see if someone has had a similar issue.
Sub testWC()
Dim s As String
Dim c As String
Dim re As RegExp
s = "hello World"
Dim pattern As String
pattern = "\s+"
Dim replacement As String
replacement = "-"
c = re.Replace(s, replacement)
Debug.Print (c)
End Sub
Extra information: Using Excel 2010. Have successfully linked all my references (Microsoft VBScript Regular Expressions 5.5". I was sucessfully able to replace the spaces using the vanilla "Replace" function, however as the number of spaces between elements vary I am unable to use that to solve my issue.
Ed: My .txt file is not fixed either, there are a number of rows that are different lengths so I am unable to use the MID function in excel to dissect the string either
Please help
Thanks,
J.H.
You're not setting up the RegExp object correctly.
Dim pattern As String
pattern = "\s+" ' pattern is just a local string, not bound to the RegExp object!
You need to do this:
Dim re As RegExp
Set re = New RegExp
re.Pattern = "\s+" ' Now the pattern is bound to the RegExp object
re.Global = True ' Assuming you want to replace *all* matches
s = "hello World"
Dim replacement As String
replacement = "-"
c = re.Replace(s, replacement)
Try setting the pattern inside your Regex object. Right now, re is just a regex with no real pattern assigned to it. Try adding in re.Pattern = pattern after you initialize your pattern string.
You initialized the pattern but didn't actually hook it into the Regex. When you ended up calling replace it didn't know what it was looking for pattern wise, and threw the error.
Try also setting the re as a New RegExp.
Sub testWC()
Dim s As String
Dim c As String
Dim re As RegExp
Set re = New RegExp
s = "hello World"
Dim pattern As String
pattern = "\s+"
re.Pattern = pattern
Dim replacement As String
replacement = "-"
c = re.Replace(s, replacement)
Debug.Print (c)
End Sub

Whole word replacements using Regular Expression

I have a list of original words and replace with words which I want to replace occurrence of the original words in some sentences to the replace words.
For example my list:
theabove the above
myaddress my address
So the sentence "This is theabove." will become "This is the above."
I am using Regular Expression in VB like this:
Dim strPattern As String
Dim regex As New RegExp
regex.Global = True
If Not IsEmpty(myReplacementList) Then
For intRow = 0 To UBound(myReplacementList, 2)
strReplaceWith = IIf(IsNull(myReplacementList(COL_REPLACEMENTWORD, intRow)), " ", varReplacements(COL_REPLACEMENTWORD, intRow))
strPattern = "\b" & myReplacementList(COL_ORIGINALWORD, intRow) & "\b"
regex.Pattern = strPattern
TextToCleanUp = regex.Replace(TextToReplace, strReplaceWith)
Next
End If
I loop all entries in my list myReplacementList against the text TextToReplace I want to process, and the replacement have to be whole word so I used the "\b" token around the original word.
It works well but I have a problem when the original words contain some special characters for example
overla) overlay
I try to escape the ) in the pattern but it does not work:
\boverla\)\\b
I can't replace the sentence "This word is overla) with that word." to "This word is overlay with that word."
Not sure what is missing? Is regular expression the way to the above scenario?
I'd use string.replace().
That way you don't have to escape special chars .. only these: ""!
See here for examples: http://www.dotnetperls.com/replace-vbnet
Regex is good if your looking for patterns. Or renaming your mp3 collection ;-) and much, much more. But in your case, I'd use string.replace().