Cannot seem to find the index where regex gets a match

Cannot seem to find the index where regex gets a match - regex

I'm using vb.net and something that seems obvious in PHP does not work (for me) in vb.net:
Extract = "100011100000"
Dim HandReg3 As New Regex("(?:[0]*)(1{3,})(?:[0]*)")
If HandReg3.IsMatch(Extract) Then
Dim m() As String = HandReg3.Split(Extract)
For Each item As String In m
Console.WriteLine(item)
Next
Dim m1 As Match = Regex.Match(Extract, "(?:[0]*)(1{3,})(?:[0]*)")
Console.WriteLine(m1.Index)
Console.WriteLine(m1.Length)
End If
Which writes:
1
111
1 <-Index. Shouldn't it be 4 (0 based) or 5????
111
I've tried several combinations of groups (capturing and not capturing).
Of course it has to be something obvious and stupid but after almost 8 hours. I just can't get it right!
Tried:
([0-1]*)(1{3,})([0-1]*)
[0-1]*1{3,}[0-1]*
and every other combination that I can thought of in between.
Thanks for your help
Emiliano

Related

regex for excluding text at end of string

I have a regular expression (built in adobe javascript) which finds string which can be of varying length.
The part I need help with is when the string is found I need to exclude the extra characters at the end, which will always end with 1 1.
This is the expression:
var re = new RegExp(/WASH\sHANDLING\sPLANT\s[-A-z0-9 ]{2,90}/);
This is the result:
WASH HANDLING PLANT SIZING STATION SERVICES SHEET 1 1 75 MOR03 MUP POS SU W ST1205 DWG 0001
I need to modify the regex to exclude the string in bold beginning with the 1 1.
Keep in mind the string searched for can be of varying length hence the {2,90}
Can anyone please advise assistance in modifying the REGEX to exclude all string from 1 1
Thank you

You may use a positive lookahead and keep the same functionality:
/WASH\sHANDLING\sPLANT\s[-A-Za-z0-9 ]{2,90}(?=\b1 1\b)/
^^^^^^^^^^^
The (?=\b1 1\b) lookahead requires 1 1 as whole "word" after your match.
See the regex demo
Also, note that [A-z] matches more than just letters.

Regex newbie: How to isolate 'num-num-num' in a string

I'm sure this is a super simple question for many of you, but I've only just started learning regex and at the moment can't for the life of me isolate what I'm after from the following:
June 2015 - Won / Void / Lost = 3-0-1
I need a solution to isolate the 'num-num-num' part at the end of the string that would work for any positive integers.
Thanks for any help
EDIT
So this line of code from a scrapy spider I'm writing produces the line above:
tips_str = sel.xpath('//*[#class="recent-picks"]//div[#class="title3"]/text()').extract()[0]
I've tried to isolate the part I'm after with:
tips_str = sel.xpath('//*[#class="recent-picks"]//div[#class="title3"]/text()').re(r'\d+-\d+-\d+$').extract()[0]
No luck though :(

The regex to capture that is:
\d+-\d+-\d+$
It works as follows:
\d+- means: capture 1 or more digits (the numbers [0-9]), and then a "-".
$ means: you should now be at the end of the line.
Translating that into the full regex pattern:
Capture 1 or more digits, then a hyphen, then 1 or more digits, then a hyphen, then 1 or more digits, and we should now be at the end of the string.
EDIT: Addressing your edits and comments:
I'm not so sure what you mean by "isolate". I'll assume that you mean you want tips_str to equal "3-0-1".
I believe the easiest way would be to first use xpath extract the string for the entire line without doing any regex. Then, when we're simply dealing with a string (instead of xpath stuff), it should be nice and easy to use regex and get the pattern out.
As far as I understand, sel.xpath('//*[#class="recent-picks"]//div[#class="title3"]/text()').extract()[0] (without .re()) is providing you with the string: "June 2015 - Won / Void / Lost = 3-0-1".
So then:
full_str = sel.xpath('//*[#class="recent-picks"]//div[#class="title3"]/text()').extract()[0]
Now that we've got the full string, we can use standard string regex to pluck the part we want out:
tips_str = false
search = re.search(r'\d+-\d+-\d+$', full_str)
if(search):
tips_str = search.group(0)
Now tips_str will equal "3-0-1". If the pattern wasn't matched at all, it'd instead equal false.
If any of my assumptions are wrong then let me know what's actually happening (like if .extract()[0] isn't giving back a string, then what is it giving back?) and I'll try to adjust this response.

Any and all numbers, so negatives, scientific notation, etc? This will match it.
/(\-?[\.\d]+(e\+|e\-)?[\.\d]*)-(\-?[\.\d]+(e\+|e\-)?[\.\d]*)-(\-?[\.\d]+(e\+|e\-)?[\.\d]*)$/ig
Tested with these:
June 2015 - Won / Void / Lost = -1.1e+3-1.01-0.1e+2
June 2015 - Won / Void / Lost = 1-2-3
June 2015 - Won / Void / Lost = 0.1--5-5.6
If you take $ out if it, it will match on all lines at the same time.

Retrieve a certain text in a string

I'd like a solution to retrieve a text in a string in a c# script
the fomat of the text is 4 digits then _ and 1 to 2 digits
test_p_2008_1_Annexe_1_prix
test_p_2008_100_Annexe_1_prix
test_p_2008_1
test_p_2008_100
For this 4 examples, i need to get
2008_1
2008_100
2008_1
2008_100
Maybe use a regex buit i'm not enought good with this

I think you're trying to retrieve text which are in 4 digits then _ and 1 to 3 digits format.
#"\d{4}_\d{1,3}"
Code:
String input = #"test_p_2008_1_Annexe_1_prix
test_p_2008_100_Annexe_1_prix
test_p_2008_1
test_p_2008_100";
Regex rgx = new Regex(#"\d{4}_\d{1,3}");
foreach (Match m in rgx.Matches(input))
Console.WriteLine(m.Groups[0].Value);
IDEONE

VBScript Regex Fill Submatches even when not Required for the Match

I'm trying to replicate Google calendar's method of creating an appointment from a narrative. I want to enter 5pm Happy Hour for 1 hour and parse it into, ultimately, an Outlook AppointmentItem.
My problem, I think, is I have a large chunk of optional text at the end. And because it's optional, the regex passes but the submatch doesn't get populated because it isn't required for the match. I want it to populate because I want to use the submatches as my parsing engine.
I have a bunch of test cases in column A (working in Excel, then will move to Outlook), and my code lists out the submatches to the right. This is a representative sample of potential input
1. 5pmCST Happy Hour for 1 hour
2. 5pm CST Happy Hour for 1 hour
3. 5pm Happy Hour for 1 hour
4. 5 pm Happy Hour for 1 hour
5. 5 pm CST Happy Hour for 1 hour
6. 5 Happy Hour for 1 hour
7. 5 Happy Hour
8. 5pmCST Happy Hour
9. 5pm CST Happy Hour
10. 5pm Happy Hour
11. 5:00CST Happy Hour for 1 hour
12. 5:00 CST Happy Hour for 1 hour
Here's the code that runs the tests
Sub testest()
Dim RegEx As VBScript_RegExp_55.RegExp
Dim Matches As VBScript_RegExp_55.MatchCollection
Dim Match As VBScript_RegExp_55.Match
Dim rCell As Range
Dim SubMatch As Variant
Dim lCnt As Long
Dim aPattern(1 To 8) As String
Set RegEx = New VBScript_RegExp_55.RegExp
aPattern(1) = "(1?[0-9](:[0-5][0-9])?)" 'time
aPattern(2) = "( ?)" 'optional space
aPattern(3) = "([ap]m)?" 'optional ampm
aPattern(4) = "( ?)" 'optional space
aPattern(5) = "([ECMP][DS]T)?" 'optional time zone
aPattern(6) = "( ?)" 'optional space
aPattern(7) = "(.+?)" 'event description
aPattern(8) = "(( for )([1-2]?[0-9](.[0-9]?[0-9])?)( hours?))?" 'optional duration
RegEx.Pattern = Join(aPattern, vbNullString)
Debug.Print RegEx.Pattern
Sheet1.Range("C1").Resize(1000, 100).ClearContents
For Each rCell In Sheet1.Range("A1").CurrentRegion.Columns(1).Cells
lCnt = 0
rCell.Offset(0, 2).Value = RegEx.test(rCell.Text)
If RegEx.test(rCell.Text) Then
Set Matches = RegEx.Execute(rCell.Text)
For Each Match In Matches
For Each SubMatch In Match.SubMatches
lCnt = lCnt + 1
rCell.Offset(0, 2 + lCnt).Value = SubMatch
Next SubMatch
Next Match
End If
Next rCell
End Sub
The pattern is
(1?[0-9](:[0-5][0-9])?)( ?)([ap]m)?( ?)([ECMP][DS]T)?( ?)(.+?)(( for )([1-2]?[0-9](.[0-9]?[0-9])?)( hours?))?
The submatches for #1 are
1 2 3 4 5 6 7
5 pm CST H
It stops matching at the "H" in Happy Hour because everything starting with the " for " is optional. If I remove the optional part, my pattern becomes
(1?[0-9](:[0-5][0-9])?)( ?)([ap]m)?( ?)([ECMP][DS]T)?( ?)(.+?)( for )([1-2]?[0-9](.[0-9]?[0-9])?)( hours?)
But #7-#10 don't pass because they don't have a duration. The submmatches for #1 give me what I want though
1 2 3 4 5 6 7 8 9 10 11
5 pm CST Happy Hour for 1 hour
I want every possible submatch to fill even if VBScript doesn't need it to to make the regex pass. I fear this is just how it works and that I'm trying to get regex to do my parsing work for me. I considered running it through increasingly more restrictive patterns until it doesn't pass, then using the last passing pattern, but that seems kludgy.
Is it possible to get regex to fill those submatches?

I have assumed each line is all the contents in a single cell. So I am able to use anchors.
I also don't think you need as many capturing groups as you have. I set up the regex with:
Group 1 Time
Group 2 am/pm
Group 3 Time Zone
Group 4 Description
Group 5 Hours (and fractions of hours)
With your data in A2:An, the following routine parses the data into the adjacent columns. It doesn't matter if a Submatch is "not filled". You could also fill elements in an array, or whatever else you want to do. If you want more submatches, you can always either add capturing groups for the optional spaces, or change the relevant non-capturing groups to capturing groups.
Also, since the "for" is optional, I chose to use a lookahead to determine the end of "description". Description will end with either a \s+for\s+ sequence; or with the "end of line". Since I have assumed there is only one entry, and one line, per cell, the multiline and global properties are irrelevant.
One has to include spaces before and after "for" so as to avoid problems if that sequence is included in Description.
Option Explicit
'set Reference to Microsoft VBScript Regular Expressions 5.5
Sub ParseAppt()
Dim R As Range, C As Range
Dim RE As RegExp, MC As MatchCollection
Dim I As Long
Set R = Range("a2", Cells(Rows.Count, "A").End(xlUp))
Set RE = New RegExp
With RE
.Pattern = "((?:1[0-2]|0?[1-9])(?::[0-5]\d)?)\s*([ap]m)?\s*([ECMT][DS]T)?\s*(.*?(?=\s+for\s+|$))(?:\s+for\s+(\d+(?:\.\d+)?)\s*hour)?"
.IgnoreCase = True
For Each C In R
If .Test(C.Text) = True Then
Set MC = .Execute(C.Text)
For I = 0 To 4
C.Offset(0, I + 1) = MC(0).SubMatches(I)
Next I
End If
Next C
End With
End Sub

RegEx Pattern in VB for Multiline Matches

I already made it to get the information in single line. I have a list of information like:
1 1 838028476391 4 23 36 P 1/820-01 *
2 1 838028476490 4 23 36 P 1/820-17 *
3 1 838028474271 4 23 36 P 1/820-21 *
4 1 838028476292 4 23 36 P 1/820-21 *
5 1 838028474263 4 23 36 P 1/820-23 *
6 1 838028473802 4 23 36 P 1/820-21 *
And I need the 12 digits numbers from every line. I tried this code:
Dim re As String
Dim re18 As String
re18 = "(\d{12})"
Dim r3 As New RegExp
r3.Pattern = re18
r3.IgnoreCase = True
r3.MultiLine = True
If r3.Test(Body) Then
Dim m3 As MatchCollection
Set m3 = r3.Execute(Body)
If m3.Item(0).SubMatches.Count > 0 Then
Dim number
For j = 1 To m3.Count
Set number = m3.Item(j - 1)
MsgBox ("Number: " + number)
Next
End If
End If
I only get the first match - even if I debug the makro and view m3 in the watch - there is only 1 match. I also tried to use the quantifiers * or + after \d{12}
How do I get this RegEx working?
And regarding RegEx I have another question: If I want to match something AFTER a special word i would put the word in the pattern at the beginning and behind that the numbers or whatever I want. If I execute this regex - do I get the information or match INCLUDING the word I put at the beginning of my pattern?!
Like: "BUS \d{12}" and I only want the numbers as a result but know that BUS stands before the numbers...

You need to use the Global option, not Multiline. Multiline changes the behavior the anchors (^ and $) so they match the beginning and end of each line, not just the beginning and end of the whole text. Global is the option that tells it to find all the matches, not just the first one.
You probably don't need to use the SubMatches property either. Your regex has only the one capturing group, which captures the whole match. That means m3.SubMatches will only contain one Item, Item(0), and it will be exactly the same as m3.Item(0). (Notice that the index of the first group is 0, not 1 as you would expect from working with other regex tools.)
Your second question is where the SubMatches property comes in. If you wanted to find every 12-digit number that follows the word "BUS" you would use a regex like this:
BUS\s*(\d{12})
...and you would retrieve the number from each match like this:
Set m3 = r3.Execute(Body)
For Each myMatch in m3
MsgBox("Number: " + m3.SubMatches(0).Value)
Next
See this page for more info.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Cannot seem to find the index where regex gets a match - regex

Related

regex for excluding text at end of string

Regex newbie: How to isolate 'num-num-num' in a string

Retrieve a certain text in a string

VBScript Regex Fill Submatches even when not Required for the Match

RegEx Pattern in VB for Multiline Matches

Categories

Resources