Match number between 2 date - regex

I got a text and I need to extract a number that is between 2 dates. I can't show the full text so I will only use the part I need, but keep in mint it's part of a bigger text.
12/14/2020 355345 12/14/2020
From that, I need to get '355345', I currently don't have anything to show of what I was doing because I was working on getting the text before a sentence, until I realized it the only place where the number is between 2 dates.
Thanks!

Here's a snippet that might help:
Suppose the input is this:
Imports System.Text.RegularExpressions
'...
Dim input As New StringBuilder
input.AppendLine("12/14/2020 355345 12/14/2020")
input.AppendLine("12/13/2020 425345 12/13/2020")
input.AppendLine("12/20/2020 93488557 12/20/2020")
input.AppendLine("12/21/2020 4 12/21/2020")
input.AppendLine("12/20/2020 3443 12/20/2020")
'...
Use RegEx to extract the numbers between the two dates as follows:
Dim patt = "(\d+\/\d+\/\d+)\s?(\d+)\s?(\d+\/\d+\/\d+)"
For Each m In Regex.
Matches(input.ToString, patt, RegexOptions.Multiline).
Cast(Of Match)
Console.WriteLine(m.Groups(2).Value)
Next
This will capture three groups. Example for the first match:
m.Groups(1).Value : 12/14/202 the first date.
m.Groups(2).Value : 355345 the number in between.
m.Groups(3).Value : 12/14/2020 the second date.
If you have no use for the captured dates, then no need to get theme grouped and use the following pattern instead:
Dim patt = "\d+\/\d+\/\d+\s?(\d+)\s?\d+\/\d+\/\d+"
For Each m In Regex.
Matches(input.ToString, patt, RegexOptions.Multiline).
Cast(Of Match)
Console.WriteLine(m.Groups(1).Value)
Next
And you will get the number between the two dates in Group 1.
The output of both is:
355345
425345
93488557
4
3443
regex101
Also, using the quantifiers in RegEx patterns is a good idea as Mr. #AndrewMorton mentioned in his appreciated comments, and that to skip any possible things like 1234/239994/2293 in the input:
Dim patt = "\d{1,2}/\d{1,2}/\d{4}\s(\d{1,})\s\d{1,2}/\d{1,2}/\d{4}"
For Each m In Regex.
Matches(input.ToString, patt, RegexOptions.Multiline).
Cast(Of Match)
Console.WriteLine(m.Groups(1).Value)
Next
The quantifiers-way test is here.

If you can safely check for numbers and slashes, then a pattern like this should work:
\d\d/\d\d/\d\d\d\d +(\d+) +\d\d/\d\d/\d\d\d\d
...where capture group 1 would hold the number being sought. If you need to validate that the values are actually dates, well... you can do it with regex to a degree, but the pattern becomes very difficult to read.

Related

Find number of Instances for few words in string while ignoring other few words using regex

Hi i am using regex in Matlab.
I need to find number of hits for few words while ignoring other few words using regex
what i have tried so far:
String = 'Sunday:Monday:Tuesday:Wednesday:Thursday:Friday:Saturday:Sun:Mon:Tue:Wed:,Thu:,Fri:,Sat:';
Output = regexp( String,'^(?!.*(,Sun:|,Sunday:)).*(Sun:|Sunday:)' )
The Output of above regexp comes as true, But need it as 2 as it got hit 2 times for Sun: and Sunday:.
In next Scenario:
String = 'Sunday:Monday:Tuesday:Wednesday:Thursday:Friday:Saturday:Sun:Mon:Tue:Wed:,Thu:,Fri:,Sat:';
Output = regexp( String,'^(?!.*(,Fri:|,Friday:)).*(Fri:|Friday:)' )
The Output of above regexp comes as false, But need it as 1 as it*** got hit 1 time*** for Friday:.
I also tried:
regexp( String,'^(?!.*(,Sun:|,Sunday:)).*(Sun:|Sunday:)' ,'match')
But its giving Output as whole string.
I am confused how to get number of hits while ignoring other words, Help would be appreciated regexp work in Matlab same as normal.
You can use
(?<!,)Fri(?:day)?:
It matches
(?<!,) - a location not immediately preceded with ,
Fri - Fri
(?:day)? - an optional day string
: - a colon.
See the regex demo.
If you allow some redundancy, you may build the pattern like this:
(?<!,)(Fri:|Sunday:)
It will match Fri: or Sunday: not immediately preceded with a comma.
Unless you really need to use regexp, something like this will be easier to maintain:
Output = sum(ismember(strsplit(String,':'),{'Sunday','Sun'}))

Access multiple captures of one capture group in substition string

Suppose I have the regex (\d)+.
In .Net I can access all captures of this capture group using the match.Groups[1].Captures.
Can I also access these captures in a substition string?
So for example for the input string 523, I need to use 5, 2 and 3 in my substition string (and not just 3, which is $1).
If you intend to capture the digits each in its separate capturing group then you need to actually make a separate capturing groups for every digits like this:
(\d)(\d)(\d)
NOTE: This does not scale very well and you could not match numbers of any other length than 3 digits. In other words, no math on either 23 or 345667!
An good page with a long and detailed explanation why this cant be done as (\d)+ can be found here:
https://www.regular-expressions.info/captureall.html
So if this is indeed what you want then you need to craft your own loop that searches the string for every digit separately.
If you on the other hand need to capture the number and not the individual digits then you simply put the +sign in the wrong position. I think you should write:
(\d+)
I think the OP wants to get every single digit match separately.
Perhaps this will help you then:
<!-- language: lang-vb -->
' Create a list to put the resulting matches in
Dim ResultList As StringCollection = New StringCollection()
Dim RegexObj As New Regex("(\d)")
Dim MatchResult As Match = RegexObj.Match(strName)
While MatchResult.Success
ResultList.Add(MatchResult.Groups(1).Value)
' Console.WriteLine(MatchResult.Groups(1).Value)
MatchResult = MatchResult.NextMatch()
End While

Replacing everything but the matching regex string

I've searched for this answer but haven't found an answer that exactly works.
I have the following pattern where the hashes are any digit: 102###-###:#####-### or 102###-###:#####-####
It must start with 102 and the last set in the pattern can either be 3 or 4 digits.
The problem is that I can have a string with between 1-5 of these patterns in it with any sort of characters in between (spaces, letters etc). The Regex I posted below matches the patterns well but I am trying to select everything that is NOT this pattern so I can remove it. The end goal is to extract all the patterns and just have all the patterns comma delimited as the output. (Pattern, Pattern, Pattern) How do I accomplish this with regex?Perhaps there is a better way than trying to take this line? Thanks. This is using VBA.
Regex For Pattern:(\D102\d{3}-\d{3}:\d{5}-\d{3,4}\D)
String Example: type:102456-345:56746-234 102456-345:56746-2343 FollowingCell#:102456-345:56746-234 exampletext##$% 102456-345:56746-2345 stuff
No need to grab everything you don't need to remove it: That's more difficult. Just grab everything you need and do whatever you want with it.
See regex in use here
(?<!\d)102\d{3}-\d{3}:\d{5}-\d{3,4}(?!\d)
See code in use here
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim sourcestring as String = "type:102456-345:56746-234 102456-345:56746-2343 FollowingCell#:102456-345:56746-234 exampletext##$% 102456-345:56746-2345 stuff"
Dim re As Regex = New Regex("(?<!\d)102\d{3}-\d{3}:\d{5}-\d{3,4}(?!\d)")
Dim mc as MatchCollection = re.Matches(sourcestring)
For each m as Match in mc
Console.WriteLine(m.Groups(0).Value)
Next
End Sub
End Module
Result:
102456-345:56746-234
102456-345:56746-2343
102456-345:56746-234
102456-345:56746-2345
I am trying to select everything that is NOT this pattern so I can remove it. The end goal is to extract all the patterns and just have all the patterns comma delimited as the output
If you want to extract the patterns, then just do that, without removing everything around them. Example in Python: (Posted before the question's language was specified, but I'm sure the same can be done in VBA.)
>>> import re
>>> p = r"102\d{3}-\d{3}:\d{5}-\d{3,4}"
>>> text = "type:102456-345:56746-234 102456-345:56746-2343 FollowingCell#:102456-345:56746-234 exampletext##$% 102456-345:56746-2345 stuff"
>>> ",".join(re.findall(p, text))
'102456-345:56746-234,102456-345:56746-2343,102456-345:56746-234,102456-345:56746-2345'

Excel VBA Regex Check For Repeated Strings

I have some user input that I want to validate for correctness. The user should input 1 or more sets of characters, separated by commas.
So these are valid input
COM1
COM1,COM2,1234
these are invalid
COM -- only 3 characters
COM1,123 -- one set is only 3 characters
COM1.1234,abcd -- a dot separator not comma
I googled for a regex pattern to this and found a possible pattern that tested for a recurring instance of any 3 characters, and I modified like so
/^(.{4,}).*\1$/
but this is not finding matches.
I can manage the last comma that may or may not be there before passing to the test so that it is always there.
Preferably, I would like to test for letters (any case) and numbers only, but I can live with any characters.
I know I could easily do this in straight VBA splitting the input on a comma delimiter and looping through each character of each array element, but regex seems more efficient, and I will have more cases than have slightly different patterns, so parameterising the regex for that would be better design.
TIA
I believe this does what you want:
^([A-Z|a-z|0-9]{4},)*[A-Z|a-z|0-9]{4}$
It's a line beginning followed by zero or more groups of four letters or numbers ending with a comma, followed by one group of four letters or number followed by an end-of-line.
You can play around with it here: https://regex101.com/r/Hdv65h/1
The regular expression
"^[\w]{4}(,[\w]{4})*$"
should work.
You can try this to see whether it works for all your cases using the following function. Assuming your test strings are in cells A1 thru A5 on the spreadsheet:
Sub findPattern()
Dim regEx As New RegExp
regEx.Global = True
regEx.IgnoreCase = True
regEx.Pattern = "^[\w]{4}(,[\w]{4})*$"
Dim i As Integer
Dim val As String
For i = 1 To 5:
val = Trim(Cells(i, 1).Value)
Set mat = regEx.Execute(val)
If mat.Count = 0 Then
MsgBox ("No match found for " & val)
Else
MsgBox ("Match found for " & val)
End If
Next
End Sub

1 to 5 of the same groups in REGEX

For a string such as:
abzyxcabkmqfcmkcde
Notice that there are string patterns between ab and c in bold. To capture the first string pattern:
ab([a-z]{3,5})c
Is it possible to match both of the groups from the sample string? Actually, there should be 1 to 5 groups.
Note: python style regex.
You can verify that a given string conforms to the 1-5 repetitions of ab([a-z]{3,5})c using this regex
(?:ab([a-z]{3,5})c){1,5}
or this one if there are characters expected between the groups
(?:ab([a-z]{3,5})c.*?){1,5}
You will only be able to extract the last matching group from that string however, not any of the previous ones. to get a previous one you need to use hsz's approach
Just match all results - i.e. with g flag:
/ab([a-z]{3,5})c/g
or some method like in Python:
re.findall(pattern, string, flags=0)