Regular expression is treating group as a string

Regular expression is treating group as a string - regex

I have a regular expression that uses the matched value from another REGEX in it. But when I test the regular expression it's not capturing the second regex group. Instead it's treating the group as a string. How would I get this regex to output the group?
Private Sub CreateGraphicsFunction(sender As Object, e As EventArgs)
Dim Regex = New Regex("infoEntityIdent=""(ICN.+?)[""].*?[>]")
Dim ICNFiles = Directory.EnumerateFiles(MoveToPath, "*.*", SearchOption.AllDirectories)
For Each tFile In ICNFiles
Dim input = File.ReadAllText(tFile)
Dim match = Regex.Match(input)
If match.Success Then
GraphicList.Add(match.Groups(1).Value)
Dim Regex2 = New Regex("<!ENTITY " & match.Groups(1).Value & " SYSTEM ""(ICN.+?[.]\w.+?)[""]")
Debug.Write(Regex2) ' outputs !ENTITY ICN-GAASIB0-00-051105-A-0YJB5-00005-A-001-01 SYSTEM "(ICN.+?[.]\w.+)["]
Dim sysFileMatch = Regex2.Match(input)
If sysFileMatch.Success Then
ICNList.Add(sysFileMatch.Groups(1).Value)
Debug.Write("found ICN " & sysFileMatch.Groups(1).Value)
End If
End If
Next
End Sub
Examples
the first Regex captures the ICN number. E.g
Using this regex captures the ICN number.
New Regex("infoEntityIdent=""(ICN.+?)[""].*?[>]")
From there I want to use the value captured in the group to go through the file again and find the matching ICN with ext. E.g
So I use the captured group from the first regex in the new regex to get the ICN number with extension.
New Regex("<!ENTITY " & match.Groups(1).Value & " SYSTEM ""(ICN.+?[.]\w.+?)[""]")
When I test this Regex out put it gives me
!ENTITY ICN-GAASIB0-00-051105-A-0YJB5-00005-A-001-01 SYSTEM "(ICN.+?[.]\w.+)["]
It's ignoring the second Regex grouping and instead treating it like part of the string instead of being used as a group. What I want is the ICN number with extension after SYSTEM
Lastest Code sample to try to get it to work
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim Files = Directory.EnumerateFiles(MovePath, "*.*", SearchOption.AllDirectories)
For Each tFile In Files
Dim input = File.ReadAllText(tFile)
Dim strREGEX = New Regex("(?=[\S\s]*?infoEntityIdent\s*=\s*""\s*(ICN[\S\s]+?)\s*""[\S\s]*?>)[\S\s]*?<!ENTITY\s+\1\s+SYSTEM\s+""\s*(ICN[\S\s]+?\.\w[\S\s]+?)\s*")
Dim match = strREGEX.Match(tFile)
If match.Success Then
Debug.Write(match.Groups(2).Value)
Else
Debug.Write(match.Groups(2).Value & " was not found")
End If
Next
End Sub

Combine both regex into a single regex.
This avoids the hassle of human intervention error.
This is both your actual regex combined into a single regex.
I've adjusted it so it's a good regex now.
If it doesn't match, I have no way of checking it, you've never
posted a target string.
Raw: (?=[\S\s]*?infoEntityIdent\s*=\s*"\s*(ICN[\S\s]+?)\s*"[\S\s]*?>)[\S\s]*?<!ENTITY\s+\1\s+SYSTEM\s+"\s*(ICN[\S\s]+?\.\w[\S\s]+?)\s*"
Stringed: #"(?=[\S\s]*?infoEntityIdent\s*=\s*""\s*(ICN[\S\s]+?)\s*""[\S\s]*?>)[\S\s]*?<!ENTITY\s+\1\s+SYSTEM\s+""\s*(ICN[\S\s]+?\.\w[\S\s]+?)\s*"""
Formatted and Explained:
(?= # Look ahead to find the ID ICN
[\S\s]*?
infoEntityIdent \s* = \s*
"
\s*
( ICN [\S\s]+? ) # (1), Entity IDent ICN
\s*
"
[\S\s]*? >
)
# Consume now:
[\S\s]*? # Find the ID ICN inside an ENTITY
<!ENTITY \s+
\1 # Back reference to Entity IDent ICN
\s+ SYSTEM \s+
"
\s*
( # (2 start), Some other ICN junk
ICN
[\S\s]+?
\.
\w
[\S\s]+?
) # (2 end)
\s*
"

You are most likely going to want to "escape" your "unknown" result from your first search to be able to use it in your new regular expression.
Something like:
Dim EscapedSearchValue As String = Regex.Escape(match.Groups(1).Value)
Dim Regex2 = New Regex("<!ENTITY " & EscapedSearchValue & " SYSTEM ""(ICN.+?[.]\w.+?)[""]")
See Regex.Escape(String) Method

Related

Regex: match string between two strings within an Excel Visiual Basic application (VBA) function (marco, module). (regular expression)

I do have this wonderful regular expression: (?<=, )(.*)(?= \(), which matches any characters between "," and "(".
For eg. from the following string it matches the highlighted text: "Hey man, my regex is Super (Mega) Cool (SC)". I tested in various regex testers (e.g. https://extendsclass.com/regex-tester.html#ruby).
However, when using it in an Excel VBA Module to create my own function, it does not work (see below).
Function extrCountryN(cellRef) As String
Dim RE As Object, MC As Object, M As Object
Dim sTemp As Variant
Const sPat As String = "((?<=, )(.*)(?= \())"
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.MultiLine = True
.Pattern = sPat
If .Test(cellRef) Then
Set MC = .Execute(cellRef)
For Each M In MC
sTemp = sTemp & ", " & M.SubMatches(0)
Next M
End If
End With
extrCountryN = Mid(sTemp, 3)
End Function
'https://extendsclass.com/regex-tester.html
Trying similar regex in the same module works perfectly find for me, e.g. ^(.*?)(?= \() successfully matches everything before the first "(".
How to get it fixed?

As the VBA regex engine does not support lookbehind assertions, you can remove it and use a consuming pattern instead. It is simple in this case because you are actually only using the captured value (with M.SubMatches(0)) in your code.
So, the quick fix is
Const sPat As String = ", (.*)(?= \()"
If you need to deal with tabs or spaces, or any whitespace, you need \s rather than a literal space:
Const sPat As String = ",\s+(.*)(?=\s\()"
See this regex demo.
Details:
, - a comma
\s+ - one or more whitespaces
(.*) - Group 1: any zero or more chars other than line break chars as many as possible
(?=\s\() - a positive lookahead that matches a location that is immediately followed with a whitespace and ( char.
See the demo screenshot:

Excel VBA Find & Replace Text In A String With Loop

I'm trying to replace all the text in a string between the pattern "&CC[number]:[number]" and replace it with a "==".
Here is the string. "T &CC3:5 Q8 Party/ Self-Identify&CC6:8 Male&CC9:11 Female&CC12:15 Q1 Vote"
This is what I need it to look like T &CC3:5==&CC6:8==&CC9:11==&CC12:15==
I know I need to loop through this string but I'm not sure the best way to set this up.
Dim stringOne As String
Dim regexOne As Object
Set regexOne = New RegExp
regexOne.Pattern = "([Q])+[0-9]"
regexOne.Global = False
stringOne = "T &CC3:5 Q8 Party/ Self-Identify&CC6:8 Male&CC9:11 Female&CC12:15 Q1 Vote"
Debug.Print regexOne.Replace(stringOne, "==")
End Sub
I have also explored using this regular expression regexOne.Pattern = "([&])+[C]+[C]+[0-9]+[:]+[0-9]"
I plan to eventually set the variable stringOne to Range("A1").Text

You could simplify the pattern a bit and use a capturing group and a positive lookahead
(&CC[0-9]+:[0-9]+).*?(?=&C|$)
Explanation
( Capture group 1
&CC[0-9]+:[0-9]+ Match &CC 1+ digits, : and 1+ digits
) Close group
.*? Match 0+ times any char except a newline non greedy
(?=&C|$) Positive lookahead, assert what is directly on the right is either &C or the end of the string
Regex demo
In the replacement use the first capturing group followed by ==

RegEx to extract a word from mail's body

I need to extract a word from incoming mail's body.
I used a Regex after referring to sites but it is not giving any result nor it is throwing an error.
Example: Description: sample text
I want only the first word after the colon.
Dim reg1 As RegExp
Dim M1 As MatchCollection
Dim M As Match
Dim EAI As String
Set reg1 = New RegExp
With reg1
.Pattern = "Description\s*[:]+\s*(\w*)\s*"
.Global = False
End With
If reg1.Test(Item.Body) Then
Set M1 = reg1.Execute(Item.Body)
For Each M In M1
EAI = M.SubMatches(1)
Next
End If

Note that your pattern works well though it is better written as:
Description\s*:+\s*(\w+)
And it will match Description, then 0+ whitespaces, 1+ : symbols, again 0 or more whitespaces and then will capture into Group 1 one or more word characters (as letters, digits or _ symbols).
Now, the Capture Group 1 value is stored in M.SubMatches(0). Besides, you need not run .Test() because if there are no matches, you do not need to iterate over them. You actually want to get a single match.
Thus, just use
Set M1 = reg1.Execute(Item.body)
If M1.Count > 0 Then
EAI = M1(0).SubMatches(0)
End If
Where M1(0) is the first match and .SubMatches(0) is the text residing in the first group.

VBA Regex - Grab Hour HH:MM from String

Given a arbitary string I want to grab an hour (HH:MM) from the string.
Here is my regex:
^ # Start of string
(?: # Try to match...
(?: # Try to match...
([01]?\d|2[0-3]): # HH:
)? # (optionally).
([0-5]?\d): # MM: (required)
)? # (entire group optional, so either HH:MM:, MM: or nothing)
$ # End of string
And my code:
Public Sub RegexTest()
Dim oRegex As Object
Dim time_match As Object
Set oRegex = CreateObject("vbscript.regexp")
With oRegex
.Global = True
.Pattern = "^(?:(?:([01]?\d|2[0-3]):)?([0-5]?\d):)$" 'HH:MM
End With
Dim s As String: s = "START TIME: Mar. 3rd 2016 12:00am"
Set time_match = oRegex.Execute(s)
If time_match.Count = 1 Then
Debug.Print time_match.Matches(0)
Else
End If
End Sub
However I am unable to match here and get no output.

Your ^(?:(?:([01]?\d|2[0-3]):)?([0-5]?\d):)$ pattern only matches a full string that starts with an optional HH: part, and and obligatory MM part followed with an obligatory :.
I suggest
(?:[01]?\d|2[0-3]):[0-5]\d
Since you are matching a part of the string.
See regex demo

How to match the particular part from the nth index of the specific character?

I have the input data as,
"Thumbnail":"/images/7.0.2.5076_1/spacer.gif","URL":"http://id800/home/LayoutManager/l1.html/1407462681_292_2_2_1398567201/"
And I want to match the l1.html part of it. It can be anything. So I want to match the Part of URL which occurs before the second last occurrence of the / and after the third last occurrence of the /. That part either the number, alphanumeric, or the alphnumeric with .html extension. so besically I want to match the part between the 3rd and 2nd / from end. I tried lots of combinations but I was unable to come up with. Any help would be great.

Pattern:
\".+?(\w+\.\w{3,5})\/.+?\"
\" will match starting and ending quote
.+? will match any number of characters
\w+ will match any number of words
\. will match .(dot)
\w{3,5} will match any word which are 3-5 characters long
\/ will match /(forward slash)
() these parenthesis capture in separate group
Code in action:
string pattern = "\".+?(\\w+\\.\\w{3,5})\\/.+?\"";
string text = "\"Thumbnail\":\"/images/7.0.2.5076_1/spacer.gif\",\"URL\":\"http://id800/home/LayoutManager/l1.html/1407462681_292_2_2_1398567201/\"";
MatchCollection matches = Regex.Matches(text, pattern);
if (matches != null && matches[0].Groups != null)
{
string value = matches[0].Groups[1].Value; //Output: l1.html
}

You have not provided the whole JSON string, but I think my snippet will help you get what you want anyway without regex. Add a reference to System.Web.Extensions, and use the following code:
Dim s As String = "[{""Thumbnail"":""/images/7.0.2.5076_1/spacer.gif"",""URL"":""http://id800/home/LayoutManager/l1.html/1407462681_292_2_2_1398567201/""}]" ' "[{""application_id"":""1"",""application_package"":""abc""},{""application_id"":""2"",""application_package"":""xyz""}]"
Dim jss As New System.Web.Script.Serialization.JavaScriptSerializer()
Dim dict = jss.Deserialize(Of List(Of Object))(s)
For Each d In dict
For Each v In d
If v.Key = "URL" Then
Dim tmp = v.Value.Trim("/"c).ToString().Split("/"c)
MsgBox(tmp(tmp.Length - 2))
End If
Next
Next
Result:
The substring you need can be obtained without a regex by mere splitting the value with /, and accessing the last but one element.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular expression is treating group as a string - regex

Related

Regex: match string between two strings within an Excel Visiual Basic application (VBA) function (marco, module). (regular expression)

Excel VBA Find & Replace Text In A String With Loop

RegEx to extract a word from mail's body

VBA Regex - Grab Hour HH:MM from String

How to match the particular part from the nth index of the specific character?

Categories

Resources