Counting lowercase with regex - regex

I am trying to count the number of lower case characters in a string using regex. I think I am missing something blindingly obvious but can't figure out what! It is for an old classic ASP page.
<%
Password="abcd123"
Set myRegExp = New RegExp
myRegExp.Pattern = "(.*[a-z].*)"
Response.Write myRegExp.Execute(Password).Count
%>
The script returns 1 rather than 4.

Your capturing group is wrong, this is enough: ([a-z]). Using the .* you capture all that is around your lowercase character.

I think the following is what you want to do.
Password="abcd123"
Set myRegExp = New RegExp
myRegExp.Global = True ' This is required to get all matches
myRegExp.Pattern = "[a-z]"
Response.Write myRegExp.Execute(Password).Count
But I have some suggestions for you.
You can make your rule greedy with +. This will reduce the cycles.
You need to set .Global to True to get all matches, not only the first one.
With this approach you need to loop through the matches collection (returned from myRegExp.Execute) to find the right result.
Password="abcd123fooBar"
Set myRegExp = New RegExp
myRegExp.Pattern = "[a-z]+"
myRegExp.Global = True
count = 0
For Each match In myRegExp.Execute(Password)
count = count + match.Length
Next
Response.Write count 'prints 9
And here's another way to the same.
This matches with all non-lowercase characters and removes them from the result string. You can then get the length by using Len function.
Password="abcd123fooBar"
Set myRegExp = New RegExp
myRegExp.Pattern = "[^a-z]+"
myRegExp.Global = True
count = Len(myRegExp.Replace(Password, ""))
Response.Write count 'prints 9

Why you don't just use [a-z] instead, because when you use (.*[a-z].*), it will match all your input like one piece and not character by character?
You can check the difference here:
[a-z] regex
(.*[a-z].*) Your regex
I also suggest reading:
The Dot Matches (Almost) Any Character

Related

How to test for specific characters with regex in VBA

I need to test for a string variable to ensure it matches a specific format:
XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
...where x can be any alphanumerical character (a - z, 0 - 9).
I've tried the following, but it doesn't seem to work (test values constantly fail)
If val Like "^([A-Za-z0-9_]{8})([-]{1})([A-Za-z0-9_]{4})([-]{1})([A-Za-z0-9_]{4})([-]{1})([A-Za-z0-9_]{4})([-]{1})([A-Za-z0-9_]{12})" Then
MsgBox "OK"
Else
MsgBox "FAIL"
End If
.
fnCheckSubscriptionID "fdda752d-32de-474e-959e-4b5bf7574436"
Any pointers? I don't mind if this can be achieved in vba or with a formula.
You are already using the ^ beginning-of-string anchor, which is terrific. You also need the $ end-of-string anchor, otherwise in the last group of digits, the regex engine is able to match the first 12 digits of a longer group of digits (e.g. 15 digits).
I rewrote your regex in a more compact way:
^[A-Z0-9]{8}-(?:[A-Z0-9]{4}-){3}[A-Z0-9]{12}$
Note these few tweaks:
[-]{1} can just be expressed with -
I removed the underscores as you say you only want letters and digits. If you do want underscores, instead of [A-Z0-9]{8} (for instance), you can just write \w{8} as \w matches letters, digits and underscores.
Removed the lowercase letters. If you do want to allow lowercase letters, we'll turn on case-insensitive mode in the code (see line 3 of the sample code below).
No need for (capturing groups), so removed the parentheses
We have three groups of four letters and a dash, so wrote (?:[A-Z0-9]{4}-) with a {3}
Sample code
Dim myRegExp, FoundMatch
Set myRegExp = New RegExp
myRegExp.IgnoreCase = True
myRegExp.Pattern = "^[A-Z0-9]{8}-(?:[A-Z0-9]{4}-){3}[A-Z0-9]{12}$"
FoundMatch = myRegExp.Test(SubjectString)
You can do this either with a regular expression, or with just native VBA. I am assuming from your code that the underscore character is also valid in the string.
To do this with native VBA, you need to build up the LIKE string since quantifiers are not included. Also using Option Compare Text makes the "like" action case insensitive.
Option Explicit
Option Compare Text
Function TestFormat(S As String) As Boolean
'Sections
Dim S1 As String, S2_4 As String, S5 As String
Dim sLike As String
With WorksheetFunction
S1 = .Rept("[A-Z0-9_]", 8)
S2_4 = .Rept("[A-Z0-9_]", 4)
S5 = .Rept("[A-Z0-9_]", 12)
sLike = S1 & .Rept("-" & S2_4, 3) & "-" & S5
End With
TestFormat = S Like sLike
End Function
With regular expressions, the pattern is simpler to build, but the execution time may be longer, and that may make a difference if you are processing very large amounts of data.
Function TestFormatRegex(S As String) As Boolean
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.MultiLine = True
.Pattern = "^\w{8}(?:-\w{4}){3}-\w{12}$"
TestFormatRegex = .test(S)
End With
End Function
Sub Test()
MsgBox fnCheckSubscriptionID("XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX")
End Sub
Function fnCheckSubscriptionID(strCont)
' Tools - References - add "Microsoft VBScript Regular Expressions 5.5"
With New RegExp
.Pattern = "^\w{8}-\w{4}-\w{4}-\w{4}-\w{12}$"
.Global = True
.MultiLine = True
fnCheckSubscriptionID = .Test(strCont)
End With
End Function
In case of any problems with early binding you can use late binding With CreateObject("VBScript.RegExp") instead of With New RegExp.

Find and replace from given word to right parenthesis

I have just taken up the VBA route to automate a few day today tasks so pls excuse if I sound very naive
I'm trying to open a word document & then searching for a expression to highlight(Bold) it,however Im getting error "User defined type not defined"
I'm able to open the word document but unable to perform the pattern search.I have gathered bits & peices of code from internet, however its not working
I'm using Office 2013 & have added the Microsoft VBscript Reg Ex 5.5 in references.
The pattern Im searching is starting from "Dear" till ) is encountered.
Cheers #GoingMad#
Sub Pattern_Replace()
Dim regEx, Match, Matches
Dim rngRange As Range
Dim pathh As String, i As Integer
pathh = "D:\Docs\Macro.docx"
Dim pathhi As String
Dim from_text As String, to_text As String
Dim WA As Object, WD As Object
Set WA = CreateObject("Word.Application")
WA.Documents.Open (pathh)
WA.Visible = True
Set regEx = New RegExp
regEx.Pattern = "Dear[^0-9<>]+)"
regEx.IgnoreCase = False
regEx.Global = True
Set Matches = regEx.Execute(ActiveDocument.Range.Text)
For Each Match In Matches
ActiveDocument.Range(Match.FirstIndex, Match.FirstIndex + Len(Match.Value)).Bold = True
Next
End Sub
You need to escape the bracket ")" within the regex, using a back-slash:
regex.Pattern = "Dear[^0-9<>]+\)"
This is because it has a particular meaning within a regex expression.
I would personally also split the reference to the Word-Range across a few lines:
Set rngRange = ActiveDocument.Range
rngRange.Expand Unit:=wdStory
Set Matches = regex.Execute(rngRange.Text)
although this isn't necessary.
Consider the following text
Dear aunt sally ) I have gone to school.
Your regex pattern would be "Dear[^)]+"
Find the word Dear
Match Any character that is not ")"
Repeat
Refiddle here
This one will include the parenthesis. Dear[\w\s]+\)
Find the word Dear
Match Any Character or whitespace
Repeat as needed
Until a right parenthesis is found
You don't need regex for this - a wildcard Find/Replace in Word will do the job far more efficiently:
With WA.ActiveDocument.Range.Find
.ClearFormatting
.Text = "Dear[!\)]#\)"
.Replacement.ClearFormatting
.Replacement.Font.Bold = True
.Replacement.Text = "^&"
.Format = True
.Forward = True
.Wrap = wdFindContinue
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With

How to change case of matching letter with a VBA regex Replace?

I have a column of lists of codes like the following.
2.A.B, 1.C.D, A.21.C.D, 1.C.D.11.C.D
6.A.A.5.F.A, 2.B.C.H.1
8.ABC.B, A.B.C.D
12.E.A, 3.NO.T
A.3.B.C.x, 1.N.N.9.J.K
I want to find all instances of two single upper-case letters separated by a period, but only those that follow a number less than 6. I want to remove the period between the letters and convert the second letter to lower case. Desired output:
2.Ab, 1.Cd, A.21.C.D, 1.Cd.11.C.D
6.A.A.5.Fa, 2.Bc.H.1
8.ABC.B, A.B.C.D
12.E.A, 3.NO.T
A.3.Bc.x, 1.Nn.9.J.K
I have the following code in VBA.
Sub fixBlah()
Dim re As VBScript_RegExp_55.RegExp
Set re = New VBScript_RegExp_55.RegExp
re.Global = True
re.Pattern = "\b([1-5]\.[A-Z])\.([A-Z])\b"
For Each c In Selection.Cells
c.Value = re.Replace("$1$2")
Next c
End Sub
This removes the period, but doesn't handle the lower-case requirement. I know in other flavors of regular expressions, I can use something like
re.Replace("$1\L$2\E")
but this does not have the desired effect in VBA. I tried googling for this functionality, but I wasn't able to find anything. Is there a way to do this with a simple re.Replace() statement in VBA?
If not, how would I go about achieving this otherwise? The pattern matching is complex enough that I don't even want to think about doing this without regular expressions.
[I have a solution I worked up, posted below, but I'm hoping someone can come up with something simpler.]
Here is a workaround that uses the properties of each individual regex match to make the VBA Replace() function replace only the text from the match and nothing else.
Sub fixBlah2()
Dim re As VBScript_RegExp_55.RegExp, Matches As VBScript_RegExp_55.MatchCollection
Dim M As VBScript_RegExp_55.Match
Dim tmpChr As String, pre As String, i As Integer
Set re = New VBScript_RegExp_55.RegExp
re.Global = True
re.Pattern = "\b([1-5]\.[A-Z])\.([A-Z])\b"
For Each c In Selection.Cells
'Count of number of replacements made. This is used to adjust M.FirstIndex
' so that it still matches correct substring even after substitutions.
i = 0
Set Matches = re.Execute(c.Value)
For Each M In Matches
tmpChr = LCase(M.SubMatches.Item(1))
If M.FirstIndex > 0 Then
pre = Left(c.Value, M.FirstIndex - i)
Else
pre = ""
End If
c.Value = pre & Replace(c.Value, M.Value, M.SubMatches.Item(0) & tmpChr, _
M.FirstIndex + 1 - i, 1)
i = i + 1
Next M
Next c
End Sub
For reasons I don't quite understand, if you specify a start index in Replace(), the output starts at that index as well, so the pre variable is used to capture the first part of the string that gets clipped off by the Replace function.
So this question is old, but I do have another workaround. I use a double regex so to speak, where the first engine looks for the match as an execute, then I loop through each of those items and replace with a lowercase version. For example:
Sub fixBlah()
Dim re As VBScript_RegExp_55.RegExp
dim ToReplace as Object
Set re = New VBScript_RegExp_55.RegExp
for each c in Selection.Cells
with re `enter code here`
.Global = True
.Pattern = "\b([1-5]\.[A-Z])\.([A-Z])\b"
Set ToReplace = .execute(C.Value)
end with
'This generates a list of items that match. Now to lowercase them and replace
Dim LcaseVersion as string
Dim ItemCt as integer
for itemct = 0 to ToReplace.count - 1
LcaseVersion = lcase(ToReplace.item(itemct))
with re `enter code here`
.Global = True
.Pattern = ToReplace.item(itemct) 'This looks for that specific item and replaces it with the lowercase version
c.value = .replace(C.Value, LCaseVersion)
end with
End Sub
I hope this helps!

ASP Classic: Check if string only consists of valid chars

I've been checking all over the internet but really can't find any specific solution of my problem.
How do I check if a string consists of only the declared valid characters?
I want my string to consists of only 0-9, A-Z and a-z
So the string oifrmf9RWGEWRG3oi4m3ofm3mklwef-qæw should be invalid because of - and æ
while the string joidsamfoiWRGWRGmoi34m3f should be valid.
I have been using the build-in RegExp to strip the strings, but is it possible to just make it check and return a boolean false or true?
my regexp:
set pw = new regexp
pw.global = true
pw.pattern = "[^a-zA-Z0-9]"
newstring = pw.replace("iownfiwefnoi3w4mtl3.-34ø'3", "")
Thanks :)
You could do a Test which returns True or False
If( pw.Test("string") ) Then
'' Do something
End If
Try -
Dim myRegExp, FoundMatch
Set myRegExp = New RegExp
myRegExp.Pattern = "[^a-zA-Z0-9]"
FoundMatch = myRegExp.Test("iownfiwefnoi3w4mtl3.-34ø'3")
If FoundMatch is true the RegEx engine has found a character that is not a-z or A-Z or 0-9 and your string is not valid.
You could do something like:
Set match = pw.execute("iownfiwefnoi3w4mtl3.-34ø'3")
if match.count > 0 then
' your pattern matched, so it's invalid
badString = true
else
badString = false
end if
Rather than replace you can look and see if there is a match on any characters outside the whitelist. The general for each match syntax is here
[a-zA-Z0-9] works...I tried it against your string here http://gskinner.com/RegExr/?2u7c3 and here http://regexpal.com/ ...take the carrot out. I also can't remember the regex engine vbscript uses but that might have something to do with your problem. This also works...
\D?\w

using classic asp for regular expression

We have some Classic asp sites, and i'm working on them a lil' bit, and I was wondering how can I write a regular expression check, and extract the matched expression:
the expression I have is in the script's name
so Let's say this
Response.Write Request.ServerVariables("SCRIPT_NAME")
Prints out:
review_blabla.asp
review_foo.asp
review_bar.asp
How can I get the blabla, foo and bar from there?
Thanks.
Whilst Yots' answer is almost certainly correct, you can achieve the result you are looking for with a lot less code and somewhat more clearly:
'A handy function i keep lying around for RegEx matches'
Function RegExResults(strTarget, strPattern)
Set regEx = New RegExp
regEx.Pattern = strPattern
regEx.Global = true
Set RegExResults = regEx.Execute(strTarget)
Set regEx = Nothing
End Function
'Pass the original string and pattern into the function and get a collection object back'
Set arrResults = RegExResults(Request.ServerVariables("SCRIPT_NAME"), "review_(.*?)\.asp")
'In your pattern the answer is the first group, so all you need is'
For each result in arrResults
Response.Write(result.Submatches(0))
Next
Set arrResults = Nothing
Additionally, I have yet to find a better RegEx playground than Regexr, it's brilliant for trying out your regex patterns before diving into code.
You have to use the Submatches Collection from the Match Object to get your data out of the review_(.*?)\.asp Pattern
Function getScriptNamePart(scriptname)
dim RegEx : Set RegEx = New RegExp
dim result : result = ""
With RegEx
.Pattern = "review_(.*?)\.asp"
.IgnoreCase = True
.Global = True
End With
Dim Match, Submatch
dim Matches : Set Matches = RegEx.Execute(scriptname)
dim SubMatches
For Each Match in Matches
For Each Submatch in Match.SubMatches
result = Submatch
Exit For
Next
Exit For
Next
Set Matches = Nothing
Set SubMatches = Nothing
Set Match = Nothing
Set RegEx = Nothing
getScriptNamePart = result
End Function
You can do
review_(.*?)\.asp
See it here on Regexr
You will then find your result in capture group 1.
You can use RegExp object to do so.
Your code gonna be like this:
Set RegularExpressionObject = New RegExp
RegularExpressionObject.Pattern = "review_(.*)\.asp"
matches = RegularExpressionObject.Execute("review_blabla.asp")
Sorry, I can't test code below right now.
Check out usage at MSDN http://msdn.microsoft.com/en-us/library/ms974570.aspx