Writing Regex that selects a VBScript Class and it's Name - regex

I am writing a regex that selects a VBScript class and it's name. At the moment here is how it look.
Regex:
Class\s*[a-zA-Z0-9_]*
Live Code Demo: http://regexr.com/3bco8
It is fine, but I want a modification so that it only selects the "Class Person" and not select the word class at the end of the text.

Both * quantifiers should be replaced with + (one or more of). Using the .Multiline flag, anchoring at start, and capturing the class name may be a good idea. Whether you want to allow leading whitespace before Class is up to you:
Option Explicit
Dim s : s = Join(Array( _
"' Class ForgetIt" _
, "whatever" _
, " Class FindIt" _
, "whatever" _
, "End Class ' FindIt" _
, "s = ""Class NotMe""" _
, "CLASS MeToo" _
, "End Class" _
), vbCrLf)
Dim r : Set r = New RegExp
r.Global = True
r.IgnoreCase = True
r.Multiline = True
r.Pattern = "^\s*Class\s+(\w+)+"
Dim m
For Each m In r.Execute(s)
WScript.Echo m.SubMatches(0)
Next
output:
cscript 31398009.vbs
FindIt
MeToo

Related

Regex to replace word except in comments

How can I modify my regex so that it will ignore the comments in the pattern in a language that doesn't support lookbehind?
My regex pattern is:
\b{Word}\b(?=([^"\\]*(\\.|"([^"\\]*\\.)*[^"\\]*"))*[^"]*$)
\b{Word}\b : Whole word, {word} is replaced iteratively for the vocab list
(?=([^""\](\.|""([^""\]\.)[^""\]""))[^""]$) : Don't replace anything inside of quotes
My goal is to lint variables and words so that they always have the same case. However I do not want to lint any words in a comment. (The IDE sucks and there is no other option)
Comments in this language are prefixed by an apostrophe. Sample code follows
' This is a comment
This = "Is not" ' but this is
' This is a comment, what is it's value?
Object.value = 1234 ' Set value
value = 123
Basically I want the linter to take the above code and say for the word "value" update it to:
' This is a comment
This = "Is not" ' but this is
' This is a comment, what is it's value?
Object.Value = 1234 ' Set value
Value = 123
So that all code based "Value" are updated but not anything in double quotes or in a comment or part of another word such as valueadded wouldn't be touched.
I've tried several solutions but haven't been able to get it to work.
['.*] : Not preceeding an apostrophy
(?<!\s*') : BackSearch not with any spaces with apoostrophy
(?<!\s*') : Second example seemed incorrect but this won't work as the language doesn't support backsearches
Anybody have any ideas how I can alter my pattern so that I don't edit commented variables
VBA
Sub TestSO()
Dim Code As String
Dim Expected As String
Dim Actual As String
Dim Words As Variant
Code = "item = object.value ' Put item in value" & vbNewLine & _
"some.item <> some.otheritem" & vbNewLine & _
"' This is a comment, what is it's value?" & vbNewLine & _
"Object.value = 1234 ' Set value" & vbNewLine & _
"value = 123" & vbNewLine
Expected = "Item = object.Value ' Put item in value" & vbNewLine & _
"some.Item <> some.otheritem" & vbNewLine & _
"' This is a comment, what is it's value?" & vbNewLine & _
"Object.Value = 1234 ' Set value" & vbNewLine & _
"Value = 123" & vbNewLine
Words = Array("Item", "Value")
Actual = SOLint(Words, Code)
Debug.Print Actual = Expected
Debug.Print "CODE: " & vbNewLine & Code
Debug.Print "Actual: " & vbNewLine & Actual
Debug.Print "Expected: " & vbNewLine & Expected
End Sub
Public Function SOLint(ByVal Words As Variant, ByVal FileContents As String) As String
Const NotInQuotes As String = "(?=([^""\\]*(\\.|""([^""\\]*\\.)*[^""\\]*""))*[^""]*$)"
Dim RegExp As Object
Dim Regex As String
Dim Index As Variant
Set RegExp = CreateObject("VBScript.RegExp")
With RegExp
.Global = True
.IgnoreCase = True
End With
For Each Index In Words
Regex = "[('*)]\b" & Index & "\b" & NotInQuotes
RegExp.Pattern = Regex
FileContents = RegExp.Replace(FileContents, Index)
Next Index
SOLint = FileContents
End Function
As discussed in the comments above:
((?:\".*\")|(?:'.*))|\b(v)(alue)\b
3 Parts to this regex used with alternation.
A non-capturing group for text within double quotes, as we dont need that.
A non-capturing group for text starting with single quote
Finally the string "value" is split into two parts (v) and (value) because while replacing we can use \U($2) to convert v to V and rest as is so \E$3 where \U - converts to upper case and \E - turns off the case.
\b \b - word boundaries are used to avoid any stand-alone text which is not part of setting a value.
https://regex101.com/r/mD9JeR/8

Regular expression for an Excel cell with R1C1 notation

I need some code to test if a cell contains a formula with a reference to another cell.
I found the answer Find all used references in Excel formula but the solution matches wrongly also formula with references to table columns as :
=SearchValInCol2(Tabella1[articolo];[#articolo];Tabella1[b])
Then, I wrote the following VBA code using the Like operator, but surely a solution with a regular expression would be more solid (I think the following code won't work in many scenarios).
Private Function TestIfCellContainsAFormula(cellToTest As Variant) As Boolean
Dim result As Object
Dim r As Range
Dim testExpression As String
Dim objRegEx As Object
Set r = cellToTest ' INPUT THE CELL HERE , e.g. RANGE("A1")
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.IgnoreCase = True
objRegEx.Global = True
objRegEx.Pattern = """.*?""" ' remove expressions
testExpression = CStr(r.FormulaR1C1)
' search for pattern "=R[-3]C+4"
If testExpression Like "*R[[]*[]]*C*" Then
TestIfCellContainsAFormula2 = True
Exit Function
End If
' search for pattern "=RC[2]"
If testExpression Like "*R*C[[]*[]]*" Then
'If InStr(1, testExpression, "C[", vbTextCompare) <> 0 Then
TestIfCellContainsAFormula2 = True
Exit Function
End If
TestIfCellContainsAFormula2 = False
End Function
Option 1
To match R1C1 style references you can use this regex:
R(\[-?\d+\])C(\[-?\d+\])|R(\[-?\d+\])C|RC(\[-?\d+\])
See the railroad diagram for a visual explanation:
At the core is the 'offset' which is -?\d+ which is optional - followed by a digit or more. This sequence goes in the brackets ([]) to give \[-?\d+\]. Then the regex allows combinations of:
R[offset]C[offset]
R[offset]C or (|)
RC[offset] or (|)
Option 2
The regex above won't match R, C, or RC. It will match R[0], C[0], R[0]C, RC[0], and R[0]C[0] which are kind of equivalent. To eliminate those matches you might use this regex:
R(\[-?[1-9][0-9]*\])C(\[-?[1-9][0-9]*\])|R(\[-?[1-9][0-9]*\])C|RC(\[-?[1-9][0-9]*\])
Which is this:
But it seems entering R[0], C[0] and R[0]C[0] in my Excel (v2013) turns them into R, C and RC anyways - so you can avoid the additional complexity if this is not a concern.
Option 3
If you want to allow R, C and RC you can use a simpler regex:
R(\[-?\d+\])?C(\[-?\d+\])?
VBA test code
This uses Option 1.
Option Explicit
Sub Test()
Dim varTests As Variant
Dim varTest As Variant
Dim varMatches As Variant
Dim varMatch As Variant
varTests = Array("RC", _
"R[1]C", _
"RC[1]", _
"R[1]C[1]", _
"R[-1]C", _
"RC[-1]", _
"R[-1]C[-1]", _
"=SUM(A1:B2)", _
"RC[1]+R[-1]C+R[2]C[-99]", _
"R[-1]C-R[1]C[-44]-RC[999]+R[0]C[0]", _
"SearchValInCol2(Tabella1[articolo];[#articolo];Tabella1[b])")
For Each varTest In varTests
varMatches = FormulaContainsR1C1Reference(CStr(varTest))
Debug.Print "Input: " & CStr(varTest)
Debug.Print VBA.String(Len(CStr(varTest)) + 7, "-")
If IsEmpty(varMatches) Then
Debug.Print "No matches"
Else
Debug.Print UBound(varMatches) & " matches"
For Each varMatch In varMatches
Debug.Print varMatch
Next varMatch
End If
Debug.Print vbCrLf
Next varTest
End Sub
Function FormulaContainsR1C1Reference(ByVal strFormula As String) As Variant
Dim objRegex As Object
Dim strPattern As String
Dim objMatches As Object
Dim varMatches As Variant
Dim lngCounter As Long
Set objRegex = CreateObject("VBScript.RegExp")
With objRegex
' setup regex
.Global = True
.IgnoreCase = False
.Pattern = "R(\[-?\d+\])C(\[-?\d+\])|R(\[-?\d+\])C|RC(\[-?\d+\])"
' get matches
Set objMatches = .Execute(strFormula)
' iterate matches
If objMatches.Count > 0 Then
ReDim varMatches(1 To objMatches.Count)
For lngCounter = 1 To objMatches.Count
varMatches(lngCounter) = objMatches.Item(lngCounter - 1)
Next lngCounter
Else
varMatches = Empty
End If
End With
FormulaContainsR1C1Reference = varMatches
End Function
A1 style references
I posted a regex here for A1 style references:
^(?:[A-Z]|[A-Z][A-Z]|[A-X][A-F][A-D])(?:[1-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-9][0-9][0-9][0-9][0-9]|[1-9][0-9][0-9][0-9][0-9][0-9]|10[0-3][0-9][0-9][0-9][0-9]|104[0-7][0-9][0-9][0-9]|1048[0-4][0-9][0-9]|10485[0-6][0-9]|104857[0-6])$

Need vbs regex code to remove everything to the left of a word(s)

I have several strings. Examples:
PK - Package
EA -- Each
AB - Solo Container
TB -- Tube
I need to get just the text to the right of the last dash. Sometimes there may be a single dash, sometimes there may be 2 dashes (shouldn't be any more). So basically, this regex would return:
Package
Each
Solo Container
Tube
I'm always woefully ignorant when it comes to regex...
Edit:
Per karthik manchala's suggestion...
I tried the following:
objRegEx.Global = True
objRegEx.IgnoreCase = True
objRegEx.Pattern = "-\s*(\w+)"
strSearchString = _
"PK - Package"
strNewString = _
objRegEx.Replace(strSearchString, _
"")
MsgBox strNewString
and I'm getting the leftmost parts of the strings instead (PK, EA, etc...)
Am I not using the replace correctly?
Edit 2:
Played around a bit more and think I got it figure out. For anyone that may stumble upon this in the future, the following seems to have done the trick. Full code:
Set objRegEx = _
CreateObject("VBScript.RegExp")
objRegEx.Global = True
objRegEx.IgnoreCase = True
objRegEx.Pattern = "^[^-]*-* "
strSearchString = _
"PK - Package"
strNewString = _
objRegEx.Replace(strSearchString, _
"")
MsgBox strNewString
Message box shows "Package" even when there are 2 dashes.
- (.??)$
Matches from end to last dash
Also this would work InstrRev or StrReverse and Instr.
Do Until Inp.AtEndOfStream
A = Inp.Readline
Right(A, Len(A) - InstrRev(A, "-"))
Loop
You can use the following:
-\s*(\w+)
This world would be a better place if
(1) People "woefully ignorant" wrt their problem would refrain from resticting the range of possible solutions by asking for specific techniques (RegExp and Replacing in this case) and concentrate on the specs: possible inputs, expected outputs/results. This can be done with skeleton code that tests possible solutions. E.g.:
Option Explicit
Function qq(s) : qq = """" & s & """" : End Function
Function getTail(sInp)
getTail = "????"
End Function
Dim aTests : aTests = Array( _
Split("PK - Package|Package", "|") _
, Split("AB - Solo Container|Solo Container", "|") _
)
Dim aTest
For Each aTest In aTests
Dim sInp : sInp = aTest(0)
Dim sExp : sExp = aTest(1)
Dim sAct : sAct = getTail(sInp)
WScript.Echo "----", qq(sInp)
If sAct = sExp Then
WScript.Echo "ok"
WScript.Echo " result:", qq(sAct)
Else
WScript.Echo "not ok"
WScript.Echo " got:", qq(sAct)
WScript.Echo "expected:", qq(sExp)
End If
Next
output:
cscript 30067065-1.vbs
---- "PK - Package"
not ok
got: "????"
expected: "Package"
---- "AB - Solo Container"
not ok
got: "????"
expected: "Solo Container"
(2) People wouldn't try to answer with untested code. Exploiting the fact that you can re-define a Sub/Function in VBScript, adding
' karthik, used as intended (Submatch), fails for "AB - Solo Container|Solo Container"
Function getTail(sInp)
Dim r : Set r = New RegExp
r.Pattern = "-\s*(\w+)"
Dim ms : Set ms = r.Execute(sInp)
If 1 = ms.Count Then
getTail = ms(0).SubMatches(0)
Else
getTail = "NOT: 1 = ms.Count"
End If
End Function
after the first version of getTail(), you get
cscript 30067065-1.vbs
---- "PK - Package"
ok
result: "Package"
---- "AB - Solo Container"
not ok
got: "Solo"
expected: "Solo Container"
You can then easily test an improved pattern r.Pattern = "-\s*(.+)":
cscript 30067065-1.vbs
---- "PK - Package"
ok
result: "Package"
---- "AB - Solo Container"
ok
result: "Solo Container"
To see the flaw in Trigger's RegExp, you can add a further test case
Dim aTests : aTests = Array( _
Split("PK - Package|Package", "|") _
, Split("AB - Solo Container|Solo Container", "|") _
, Split("Just For Trigger - X|X", "|") _
)
and a new version of getTail()
' Trigger, RegExp
Function getTail(sInp)
Dim r : Set r = New RegExp
r.Pattern = "- (.??)$"
Dim ms : Set ms = r.Execute(sInp)
If 1 = ms.Count Then
getTail = ms(0).SubMatches(0)
Else
getTail = "NOT: 1 = ms.Count"
End If
End Function
Result:
cscript 30067065-1.vbs
---- "PK - Package"
not ok
got: "NOT: 1 = ms.Count"
expected: "Package"
---- "AB - Solo Container"
not ok
got: "NOT: 1 = ms.Count"
expected: "Solo Container"
---- "Just For Trigger - X"
ok
result: "X"
"(.??)" looks for zero or one character different from \n non-greadily. I just hope that you didn't need test code to see that
Right(A, Len(A) - InstrRev(A, "-"))
is not valid VBScript. Trigger's InStrRev technique could be used if improved further by Trim():
' Trigger, InStrRev improved
Function getTail(sInp)
getTail = Trim(Right(sInp, Len(sInp) - InstrRev(sInp, "-")))
End Function
Another non-RegExp approach uses Split():
' using Split
Function getTail(sInp)
Dim aTmp : aTmp = Split(sInp, "- ")
getTail = aTmp(UBound(aTmp))
End Function
(3) People would think twice before they upvote.

VBscript: how to Uppercase specific parts of text in a string, outside of quotes ('s)

i need your help.
in vbscript i have a string such as
s = 'Abc' and 'Def' Or 'Ghin' In 'jkl' not 'mnoR' And ... or ... NOT ... iN ...
and i want to uppercase these specific 4 operators (in any combination of lower and upper text): and, or, in, not, to Uppercase.
These operators exist outside the quotes (') - because between them i have business rules. This is important, because, as you can see in 3rd rule ('Ghin') i have 'in' in the name of the rule and in these cases i do not want the text between the quotes (business rule name) to be altered.
How can i solve this in vbscript, preferently using RegEx?
TIA
EDIT:
Thanks for your help.
Sorry, but i forgot to mention one detail: i can have text outside the quotes, namely "(" , ")", "[","]" or conditions as "1 = 1", but again: the operators to be changed exist outside the quotes and inside quotes nothing is done.
Using the previous example:
s = "('abc' and ['Def' Or 'Ghin'] In 'jkl' not 1=1 AND 'mnoR' And 'pqr' or 'xyz' NOT 'lmn' iN 'Opq')"
s = "('abc' AND ['Def' OR 'Ghin'] IN 'jkl' NOT 1=1 AND 'mnoR' AND 'pqr' OR 'xyz' NOT 'lmn' IN 'Opq')"
In other languages you may use a fancy look around pattern to (logically) apply a regexp to parts of your input only, in VBScript you should use either a regexp replace function with a state or Split().
Demo script for the first alternative:
Dim gb_magic : gb_magic = True
Function gf_magic(sMatch, nPos, sSrc)
gf_magic = sMatch
If "'" = sMatch Then
gb_magic = Not gb_magic
Else
If gb_magic Then
gf_magic = UCase(sMatch)
End If
End If
End Function
Dim s : s = "s = 'Abc and def' and 'not Def' Or 'Ghin' In 'jkl in or' not 'mnoR'"
WScript.Echo s
Dim r : Set r = New RegExp
r.Global = True
r.IgnoreCase = True
r.Pattern = "and|not|or|in|'"
gb_magic = True
s = r.Replace(s, GetRef("gf_magic"))
WScript.Echo s
output:
s = 'Abc and def' and 'not Def' Or 'Ghin' In 'jkl in or' not 'mnoR'
s = 'Abc and def' AND 'not Def' OR 'Ghin' IN 'jkl in or' NOT 'mnoR'
Keeping the method exposed by Ekkehard solution, but translating the state variable into the regular expression.
It has the drawback of a string concatenation inside the function, but it only gets called for the found operators and not for the quotes.
Dim originalString
originalString = "not 'Abc and def' and 'not Def' Or 'Ghin' In 'jkl in or' not 'mnoR' and"
Dim convertedString
Function correctCase(matchString,leftPart,operator,rightPart,position,sourceString)
correctCase = leftPart & UCase(operator) & rightPart
End Function
With New RegExp
.Pattern = "((?:'[^']*'){0,}\s*)(and|or|not|in)((?:\s*'[^']*'){0,})"
.Global = True
.IgnoreCase = True
convertedString = .Replace(originalString,GetRef("correctCase"))
End With
WScript.Echo originalString
WScript.Echo convertedString
Using regular expressions is a must?
s = "('abc' and ['Def' Or 'Ghin'] In 'jkl' not 1=1 AND 'mnoR' And 'pqr' or 'xyz' NOT 'lmn' iN 'Opq')"
p = split(s, "'")
for i = 0 to ubound(p) step 2
p(i) = ucase(p(i))
next
r = join(p, "'")
msgbox r
Something like this should work
Dim s
s = "'abc' and 'Def' Or 'Ghin' In 'jkl' not 'mnoR' And 'pqr' or 'xyz' NOT 'lmn' iN 'asd'"
s = RegExReplace(s,"\band\b","AND")
s = RegExReplace(s,"\bor\b","OR")
s = RegExReplace(s,"\bnot\b","NOT")
s = RegExReplace(s,"\bin\b","IN")
msgbox s
Function RegExReplace(OriginalStr, RegExPattern, NewStr)
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Global = True
objRegEx.IgnoreCase = True
objRegEx.Pattern = RegExPattern
RegExReplace=objRegEx.Replace(OriginalStr, NewStr)
End Function
I would agree a regex solution is the way to go. But it seems to me that you could just search for your keywords surrounded by spaces, unless of course your business rules include that type of pattern as well.
For Each k In Array(" AND ", " OR ", " IN ", " NOT ")
s = Replace(s, k, k, 1, -1, vbTextCompare)
Next
This will find your keywords (on their own, not contained within another word) and replace any instances with uppercase versions.

Regex Classic ASP

I've currently got a string which contains a URL, and I need to get the base URL.
The string I have is http://www.test.com/test-page/category.html
I am looking for a RegEx that will effectively remove any page/folder names at the end. The issue is that some people may enter the domain in the following formats:
http://www.test.com
www.test.co.uk/
www.test.info/test-page.html
www.test.gov/test-folder/test-page.html
It must return http://www.websitename.ext/ each time i.e. the domain name and extension (e.g. .info .com .co.uk etc) with a forward slash at the end.
Effectively it needs to return the base URL, without any page/folder names. Is there any easy way to do with with a Regular Expression?
Thanks.
My approach: Use a RegEx to extract the domain name. Then add http: to the front and / to the end. Here's the RegEx:
^(?:http:\/\/)?([\w_]+(?:\.[\w_]+)+)(?=(?:\/|$))
Also see this answer to the question Extract root domain name from string. (It left me somewhat disatisfied, although pointed out the need to account for https, the port number, and user authentication info which my RegEx does not do.)
Here is an implementation in VBScript. I put the RegEx in a constant and defined a function named GetDomainName(). You should be able to incorporate that function in your ASP page like this:
normalizedUrl = "http://" & GetDomainName(url) & "/"
You can also test my script from the command prompt by saving the code to a file named test.vbs and then passing it to cscript:
cscript test.vbs
Test Program
Option Explicit
Const REGEXPR = "^(?:http:\/\/)?([\w_]+(?:\.[\w_]+)+)(?=(?:\/|$))"
' ^^^^^^^^^ ^^^^^^ ^^^^^^^^^^ ^^^^
' A B1 B2 C
'
' A - An optional 'http://' scheme
' B1 - Followed by one or more alpha-numeric characters
' B2 - Followed optionally by one or more occurences of a string
' that begins with a period that is followed by
' one or more alphanumeric characters, and
' C - Terminated by a slash or nothing.
Function GetDomainName(sUrl)
Dim oRegex, oMatch, oMatches, oSubMatch
Set oRegex = New RegExp
oRegex.Pattern = REGEXPR
oRegex.IgnoreCase = True
oRegex.Global = False
Set oMatches = oRegex.Execute(sUrl)
If oMatches.Count > 0 Then
GetDomainName = oMatches(0).SubMatches(0)
Else
GetDomainName = ""
End If
End Function
Dim Data : Data = _
Array( _
"xhttp://www.test.com" _
, "http://www..test.com" _
, "http://www.test.com." _
, "http://www.test.com" _
, "www.test.co.uk/" _
, "www.test.co.uk/?q=42" _
, "www.test.info/test-page.html" _
, "www.test.gov/test-folder/test-page.html" _
, ".www.test.co.uk/" _
)
Dim sUrl, sDomainName
For Each sUrl In Data
sDomainName = GetDomainName(sUrl)
If sDomainName = "" Then
WScript.Echo "[ ] [" & sUrl & "]"
Else
WScript.Echo "[*] [" & sUrl & "] => [" & sDomainName & "]"
End If
Next
Expected Output:
[ ] [xhttp://www.test.com]
[ ] [http://www..test.com]
[ ] [http://www.test.com.]
[*] [http://www.test.com] => [www.test.com]
[*] [www.test.co.uk/] => [www.test.co.uk]
[*] [www.test.co.uk/?q=42] => [www.test.co.uk]
[*] [www.test.info/test-page.html] => [www.test.info]
[*] [www.test.gov/test-folder/test-page.html] => [www.test.gov]
[ ] [.www.test.co.uk/]
I haven't coded Classic ASP in 12 years and this is totally untested.
result = "http://" & Split(Replace(url, "http://",""),"/")(0) & "/"