I already made it to get the information in single line. I have a list of information like:
1 1 838028476391 4 23 36 P 1/820-01 *
2 1 838028476490 4 23 36 P 1/820-17 *
3 1 838028474271 4 23 36 P 1/820-21 *
4 1 838028476292 4 23 36 P 1/820-21 *
5 1 838028474263 4 23 36 P 1/820-23 *
6 1 838028473802 4 23 36 P 1/820-21 *
And I need the 12 digits numbers from every line. I tried this code:
Dim re As String
Dim re18 As String
re18 = "(\d{12})"
Dim r3 As New RegExp
r3.Pattern = re18
r3.IgnoreCase = True
r3.MultiLine = True
If r3.Test(Body) Then
Dim m3 As MatchCollection
Set m3 = r3.Execute(Body)
If m3.Item(0).SubMatches.Count > 0 Then
Dim number
For j = 1 To m3.Count
Set number = m3.Item(j - 1)
MsgBox ("Number: " + number)
Next
End If
End If
I only get the first match - even if I debug the makro and view m3 in the watch - there is only 1 match. I also tried to use the quantifiers * or + after \d{12}
How do I get this RegEx working?
And regarding RegEx I have another question: If I want to match something AFTER a special word i would put the word in the pattern at the beginning and behind that the numbers or whatever I want. If I execute this regex - do I get the information or match INCLUDING the word I put at the beginning of my pattern?!
Like: "BUS \d{12}" and I only want the numbers as a result but know that BUS stands before the numbers...
You need to use the Global option, not Multiline. Multiline changes the behavior the anchors (^ and $) so they match the beginning and end of each line, not just the beginning and end of the whole text. Global is the option that tells it to find all the matches, not just the first one.
You probably don't need to use the SubMatches property either. Your regex has only the one capturing group, which captures the whole match. That means m3.SubMatches will only contain one Item, Item(0), and it will be exactly the same as m3.Item(0). (Notice that the index of the first group is 0, not 1 as you would expect from working with other regex tools.)
Your second question is where the SubMatches property comes in. If you wanted to find every 12-digit number that follows the word "BUS" you would use a regex like this:
BUS\s*(\d{12})
...and you would retrieve the number from each match like this:
Set m3 = r3.Execute(Body)
For Each myMatch in m3
MsgBox("Number: " + m3.SubMatches(0).Value)
Next
See this page for more info.
Related
I have a set of strings that have some letters, occasional one number, and then somewhere 2 or 3 numbers. I need to match those 2 or 3 numbers.
I have this:
\w*(\d{2,3})\w*
but then for strings like
AAA1AAA12A
AAA2AA123A
it matches '12' and '23' respectively, i.e. it fails to pick the three digits in the second case.
How do I get those 3 digits?
Here is how you would do it in Java.
the regex simply matches on a group of 2 or 3 digits.
the while loop uses find() to continue finding matches and the printing the captured match. The 1 and the 1223 are ignored.
String s= "AAA1AAA12Aksk2ksksk21sksksk123ksk1223sk";
String regex = "\\D(\\d{2,3})\\D";
Matcher m = Pattern.compile(regex).matcher(s);
while (m.find()) {
System.out.println(m.group(1));
}
prints
12
21
123
Looks like the correct answer would be:
\w*?(\d{2,3})\w*
Basically, making preceding expression lazy does the job
Having trouble with writing code that will pick up the pattern I want. I want to be able to grab the first number that comes up after the words 5 Months in the .txt file that I have. If there are any other characters A-Z, parentheses, $, % etc. I want to ignore them. I keep getting an error code with VBA such as the INVALID PROCEDURE CALL OR ARGUMENT.
Currently, I have code that looks like this:
Dim reg4 As Object: Set reg4 = CreateObject("vbscript.regexp")
reg4.Pattern = "5 Months\s*([\d+]\.[\d+])\s*"
Dim MCS As Object
Set MCS = reg4.Execute(myText)
**Dim Months5 As String: Months5 = MCS(0).submatches(0)** *the error stems from this line*
where mytext is a string that consists of content from a text file. My main problem is that this text file is not always in a standardized format, so when I want to extract the first number after "5 Months" it gives me that error.
The text file could look like:
EXAMPLE 1
5 Months
($) (%) (Months) (%) (%) (%) ($) (Months)
0.00 0.0000 0.000
OR
EXAMPLE 2
5 Months
0.00
0.000
0.000
In both cases, I would ideally be able to extract that first number "0.00" in its entire form, while ignoring any other characters such as (%) or ($) as shown in example 1.
I would like to ask if anyone has any suggestions on how to rewrite the pattern statement so it will be able to pick up the first numeric instance along with the numbers after its decimal point?
Many thanks in advance!
Your regex does not match the strings you showed. You can use
\b5 Months[\s\S]*?(\d+(?:\.\d+)?)
See the regex demo. Details:
\b - a word boundary
5 Months - a literal text
[\s\S]*? - any 0 or more chars, as few as possible
(\d+(?:\.\d+)?) - Capturing group 1: one or more digits followed with an optional sequence of a . and one or more digits.
Test run in VBA:
Sub TestFn()
Dim reg4 As Object: Set reg4 = CreateObject("vbscript.regexp")
reg4.Pattern = "\b5 Months[\s\S]*?(\d+(?:\.\d+)?)"
Dim myText As String
myText = "5 Months" & vbCrLf & vbCrLf & "0.00"
Dim MCS As Object
Set MCS = reg4.Execute(myText)
Dim Months5 As String: Months5 = MCS(0).SubMatches(0)
Debug.Print (Months5)
End Sub
I am having hard time trying to convert the following regular expression into an erlang syntax.
What I have is a test string like this:
1,2 ==> 3 #SUP: 1 #CONF: 1.0
And the regex that I created with regex101 is this (see below):
([\d,]+).*==>\s*(\d+)\s*#SUP:\s*(\d)\s*#CONF:\s*(\d+.\d+)
:
But I am getting weird match results if I convert it to erlang - here is my attempt:
{ok, M} = re:compile("([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)").
re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M).
Also, I get more than four matches. What am I doing wrong?
Here is the regex101 version:
https://regex101.com/r/xJ9fP2/1
I don't know much about erlang, but I will try to explain. With your regex
>{ok, M} = re:compile("([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)").
>re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M).
{match,[{0, 28},{0,3},{8,1},{16,1},{25,3}]}
^^ ^^
|| ||
|| Total number of matched characters from starting index
Starting index of match
Reason for more than four groups
First match always indicates the entire string that is matched by the complete regex and rest here are the four captured groups you want. So there are total 5 groups.
([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)
<-------> <----> <---> <--------->
First group Second group Third group Fourth group
<----------------------------------------------------------------->
This regex matches entire string and is first match you are getting
(Zero'th group)
How to find desired answer
Here we want anything except the first group (which is entire match by regex). So we can use all_but_first to avoid the first group
> re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M, [{capture, all_but_first, list}]).
{match,["1,2","3","1","1.0"]}
More info can be found here
If you are in doubt what is content of the string, you can print it and check out:
1> RE = "([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)".
"([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)"
2> io:format("RE: /~s/~n", [RE]).
RE: /([\d,]+).*==>\s*(\d+)\s*#SUP:\s*(\d)\s*#CONF:\s*(\d+.\d+)/
For the rest of issue, there is great answer by rock321987.
I'm trying to replicate Google calendar's method of creating an appointment from a narrative. I want to enter 5pm Happy Hour for 1 hour and parse it into, ultimately, an Outlook AppointmentItem.
My problem, I think, is I have a large chunk of optional text at the end. And because it's optional, the regex passes but the submatch doesn't get populated because it isn't required for the match. I want it to populate because I want to use the submatches as my parsing engine.
I have a bunch of test cases in column A (working in Excel, then will move to Outlook), and my code lists out the submatches to the right. This is a representative sample of potential input
1. 5pmCST Happy Hour for 1 hour
2. 5pm CST Happy Hour for 1 hour
3. 5pm Happy Hour for 1 hour
4. 5 pm Happy Hour for 1 hour
5. 5 pm CST Happy Hour for 1 hour
6. 5 Happy Hour for 1 hour
7. 5 Happy Hour
8. 5pmCST Happy Hour
9. 5pm CST Happy Hour
10. 5pm Happy Hour
11. 5:00CST Happy Hour for 1 hour
12. 5:00 CST Happy Hour for 1 hour
Here's the code that runs the tests
Sub testest()
Dim RegEx As VBScript_RegExp_55.RegExp
Dim Matches As VBScript_RegExp_55.MatchCollection
Dim Match As VBScript_RegExp_55.Match
Dim rCell As Range
Dim SubMatch As Variant
Dim lCnt As Long
Dim aPattern(1 To 8) As String
Set RegEx = New VBScript_RegExp_55.RegExp
aPattern(1) = "(1?[0-9](:[0-5][0-9])?)" 'time
aPattern(2) = "( ?)" 'optional space
aPattern(3) = "([ap]m)?" 'optional ampm
aPattern(4) = "( ?)" 'optional space
aPattern(5) = "([ECMP][DS]T)?" 'optional time zone
aPattern(6) = "( ?)" 'optional space
aPattern(7) = "(.+?)" 'event description
aPattern(8) = "(( for )([1-2]?[0-9](.[0-9]?[0-9])?)( hours?))?" 'optional duration
RegEx.Pattern = Join(aPattern, vbNullString)
Debug.Print RegEx.Pattern
Sheet1.Range("C1").Resize(1000, 100).ClearContents
For Each rCell In Sheet1.Range("A1").CurrentRegion.Columns(1).Cells
lCnt = 0
rCell.Offset(0, 2).Value = RegEx.test(rCell.Text)
If RegEx.test(rCell.Text) Then
Set Matches = RegEx.Execute(rCell.Text)
For Each Match In Matches
For Each SubMatch In Match.SubMatches
lCnt = lCnt + 1
rCell.Offset(0, 2 + lCnt).Value = SubMatch
Next SubMatch
Next Match
End If
Next rCell
End Sub
The pattern is
(1?[0-9](:[0-5][0-9])?)( ?)([ap]m)?( ?)([ECMP][DS]T)?( ?)(.+?)(( for )([1-2]?[0-9](.[0-9]?[0-9])?)( hours?))?
The submatches for #1 are
1 2 3 4 5 6 7
5 pm CST H
It stops matching at the "H" in Happy Hour because everything starting with the " for " is optional. If I remove the optional part, my pattern becomes
(1?[0-9](:[0-5][0-9])?)( ?)([ap]m)?( ?)([ECMP][DS]T)?( ?)(.+?)( for )([1-2]?[0-9](.[0-9]?[0-9])?)( hours?)
But #7-#10 don't pass because they don't have a duration. The submmatches for #1 give me what I want though
1 2 3 4 5 6 7 8 9 10 11
5 pm CST Happy Hour for 1 hour
I want every possible submatch to fill even if VBScript doesn't need it to to make the regex pass. I fear this is just how it works and that I'm trying to get regex to do my parsing work for me. I considered running it through increasingly more restrictive patterns until it doesn't pass, then using the last passing pattern, but that seems kludgy.
Is it possible to get regex to fill those submatches?
I have assumed each line is all the contents in a single cell. So I am able to use anchors.
I also don't think you need as many capturing groups as you have. I set up the regex with:
Group 1 Time
Group 2 am/pm
Group 3 Time Zone
Group 4 Description
Group 5 Hours (and fractions of hours)
With your data in A2:An, the following routine parses the data into the adjacent columns. It doesn't matter if a Submatch is "not filled". You could also fill elements in an array, or whatever else you want to do. If you want more submatches, you can always either add capturing groups for the optional spaces, or change the relevant non-capturing groups to capturing groups.
Also, since the "for" is optional, I chose to use a lookahead to determine the end of "description". Description will end with either a \s+for\s+ sequence; or with the "end of line". Since I have assumed there is only one entry, and one line, per cell, the multiline and global properties are irrelevant.
One has to include spaces before and after "for" so as to avoid problems if that sequence is included in Description.
Option Explicit
'set Reference to Microsoft VBScript Regular Expressions 5.5
Sub ParseAppt()
Dim R As Range, C As Range
Dim RE As RegExp, MC As MatchCollection
Dim I As Long
Set R = Range("a2", Cells(Rows.Count, "A").End(xlUp))
Set RE = New RegExp
With RE
.Pattern = "((?:1[0-2]|0?[1-9])(?::[0-5]\d)?)\s*([ap]m)?\s*([ECMT][DS]T)?\s*(.*?(?=\s+for\s+|$))(?:\s+for\s+(\d+(?:\.\d+)?)\s*hour)?"
.IgnoreCase = True
For Each C In R
If .Test(C.Text) = True Then
Set MC = .Execute(C.Text)
For I = 0 To 4
C.Offset(0, I + 1) = MC(0).SubMatches(I)
Next I
End If
Next C
End With
End Sub
OK here is what I have:
(24(?:(?!24).)*)
its works in the fact it finds from 24 till the next 24 but not the 2nd 24... (wow some logic).
like this:
23252882240013152986400000006090000000787865670000004524232528822400513152986240013152986543530000452400
it finds from the 1st 24 till the next 24 but does not include it, so the strings it finds are:
23252882 - 2400131529864000000060900000007878656700000045 - 2423252882 - 2400513152986 - 24001315298654353000045 - 2400
that is half of what I want it to do, what I need it to find is this:
23252882 - 2400131529864000000060900000007878656700000045 - 2423252882240051315298624001315298654353000045 - 2400
lets say:
x = 24
n = 46
I need to:
find x then n characters if the n+1 character == x
so find the start take then next 46, and the 45th must be the start of the next string, including all 24's in that string.
hope this is clear.
Thanks in advance.
EDIT
answer = 24.{44}(?=24)
You're almost there.
First, find x (24):
24
Then, find n=46 characters, where the 46 includes the original 24 (hence 44 left):
.{44}
The following character must be x (24):
(?=24)
All together:
24.{44}(?=24)
You can play around with it here.
In terms of constructing such a regex from a given x, n, your regex consists of
x.{n-number_of_characters(x)}(?=x)
where you substitute in x as-is and calculate n-number_of_characters(x).
Try this:
(?(?=24)(.{46})|(.{25})(.{24}))
Explanation:
<!--
(?(?=24)(.{46})|(.{25})(.{24}))
Options: case insensitive; ^ and $ match at line breaks
Do a test and then proceed with one of two options depending on the result of the text «(?(?=24)(.{46})|(.{25})(.{24}))»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=24)»
Match the characters “24” literally «24»
If the test succeeded, match the regular expression below «(.{46})»
Match the regular expression below and capture its match into backreference number 1 «(.{46})»
Match any single character that is not a line break character «.{46}»
Exactly 46 times «{46}»
If the test failed, match the regular expression below if the test succeeded «(.{25})(.{24})»
Match the regular expression below and capture its match into backreference number 2 «(.{25})»
Match any single character that is not a line break character «.{25}»
Exactly 25 times «{25}»
Match the regular expression below and capture its match into backreference number 3 «(.{24})»
Match any single character that is not a line break character «.{24}»
Exactly 24 times «{24}»
-->