UDF Regex - yyyy only - regex

I am just learning some regex, and I need help spitting out matches generated by my regex code. I found some very useful resources here to output anything not matched, but I want to output only the parts of a cell that do match. I am looking for dates in cells, that may be a single yyyy date or yyyy-yy, or the like (as shown from the sample data below).
Sample data:
1951/52
1909-13
2005-2014
7 . (1989)-
1 (1933/34)-2 (1935/36)
1979-2012/2013
Current Function Code: (A snippet found from an existing post here, but returns the replacement value instead of what was matched)
Function simpleCellRegex(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
strPattern = "([12][0-9]{3}[/][0-9]{2,4})|([12][0-9]{3}[-][0-9]{2,4})|([12][0-9]{3})"

You may use
\b[12][0-9]{3}(?:[,/-][0-9]{2,4})*\b
See the regex demo
Note that \b might be removed if you are not interested in a whole word search.
Pattern details:
\b - leading word boundary (the preceding char must be either a non-word char or the start of string)
[12][0-9]{3} - 1 or 2 followed with any 3 digits
(?:[,/-][0-9]{2,4})* - zero or more sequences ((?:...)*) of:
[,/-] - a ,, / or - characters
[0-9]{2,4} - any 2 to 4 digits
\b - trailing word boundary (there must be a non-word char or the end of string after).
Sample VBA code to grab all those values using RegExp#Execute:
Sub FetchDateLikeStrs()
Dim cellContents As String
Dim reg As regexp
Dim mc As MatchCollection
Dim m As match
Set reg = New regexp
reg.pattern = "\b[12][0-9]{3}(?:[,/-][0-9]{2,4})*\b"
reg.Global = True
cellContents = "1951/52 1909-13 2005-2014 7 . (1989)- 1 (1933/34)-2 (1935/36) 1979-2012/2013 1951,52"
Set mc = reg.Execute(cellContents)
For Each m In mc
Debug.Print m.Value
Next
End Sub

Related

How to exclude an amount?

I have two strings with the same amount:
Price $22.00
Price Max=$22.00
Can someone please advise how I can modify this regex pattern to make sure that the price with a "Max" in front of it will be ignored?
(?:MAX=|MAX=\s)[$]?[0-9]{0,2}?[,]?[0-9]{1,3}[.][0-9]{0,2}
You may capture the MAX= into an optional capturing group and check if it matched when all matches are found. Only grab the value if the Group 1 did not match:
Dim strPattern As String: strPattern = "(MAX=\s*)?\$\d[\d.,]*"
Dim regEx As Object
Dim ms As Object, m As Object
Set regEx = CreateObject("VBScript.RegExp")
regEx.Global = True
regEx.Pattern = strPattern
Dim t As String
t = "Price $24.00 Price Max=$22.00 "
Set ms = regEx.Execute(t)
For Each m In ms
If Len(m.SubMatches(0)) = 0 Then
Debug.Print m.value
End If
Next
The (MAX=\s*)?\$\d[\d.,]* pattern matches MAX= and 0+ whitespaces into an optional group, it matches 1 or 0 times. \$\d[\d.,]* will match a digit and any 0+ digits, commas and dots. If Len(m.SubMatches(0)) = 0 Then will check if Group 1 is not empty, and if yes, the match is valid.
One way to do it could be to match what you don't want and to capture what you do want in a capturing group using an alternation:
Max=\s*\$[0-9]+\.[0-9]+|(\$[0-9]+\.[0-9]+)

Get the non-matching part of the pattern through a RegEx

In this topic, the idea is to take "strip" the numerics, divided by a x through a RegEx. -> How to extract ad sizes from a string with excel regex
Thus from:
uni3uios3_300x250_ASDF.html
I want to achieve through RegEx:
300x250
I have managed to achieve the exact opposite and I am struggling some time to get what needs to be done.
This is what I have until now:
Public Function regExSampler(s As String) As String
Dim regEx As Object
Dim inputMatches As Object
Dim regExString As String
Set regEx = CreateObject("VBScript.RegExp")
With regEx
.Pattern = "(([0-9]+)x([0-9]+))"
.IgnoreCase = True
.Global = True
Set inputMatches = .Execute(s)
If regEx.test(s) Then
regExSampler = .Replace(s, vbNullString)
Else
regExSampler = s
End If
End With
End Function
Public Sub TestMe()
Debug.Print regExSampler("uni3uios3_300x250_ASDF.html")
Debug.Print regExSampler("uni3uios3_34300x25_ASDF.html")
Debug.Print regExSampler("uni3uios3_8x4_ASDF.html")
End Sub
If you run TestMe, you would get:
uni3uios3__ASDF.html
uni3uios3__ASDF.html
uni3uios3__ASDF.html
And this is exactly what I want to strip through RegEx.
Change the IF block to
If regEx.test(s) Then
regExSampler = InputMatches(0)
Else
regExSampler = s
End If
And your results will return
300x250
34300x25
8x4
This is because InputMatches holds the results of the RegEx execution, which holds the pattern you were matching against.
As requested by the OP, I'm posting this as an answer:
Solution:
^.*\D(?=\d+x\d+)|\D+$
Demonstration: regex101.com
Explanation:
^.*\D - Here we're matching every character from the start of the string until it reaches a non-digit (\D) character.
(?=\d+x\d+) - This is a positive lookahead. It means that the previous pattern (^.*\D) should only match if followed by the pattern described inside it (\d+x\d+). The lookahead itself doesn't capture any character, so the pattern \d+x\d+ isn't captured by the regex.
\d+x\d+ - This one should be easy to understand because it's equivalent to [0-9]+x[0-9]+. As you see, \d is a token that represents any digit character.
\D+$ - This pattern matches one or more non-digit characters until it reaches the end of the string.
Finally, both patterns are linked by an OR condition (|) so that the whole regex matches one pattern or another.

Use Regex to Split Numbered List array into Numbered List Multiline

I am trying to learn Regex to answer a question on SO portuguese.
Input (Array or String on a Cell, so .MultiLine = False)?
1 One without dot. 2. Some Random String. 3.1 With SubItens. 3.2 With number 0n mid. 4. Number 9 incorrect. 11.12 More than one digit. 12.7 Ending (no word).
Output
1 One without dot.
2. Some Random String.
3.1 With SubItens.
3.2 With number 0n mid.
4. Number 9 incorrect.
11.12 More than one digit.
12.7 Ending (no word).
What i thought was to use Regex with Split, but i wasn't able to implement the example on Excel.
Imports System.Text.RegularExpressions
Module Example
Public Sub Main()
Dim input As String = "plum-pear"
Dim pattern As String = "(-)"
Dim substrings() As String = Regex.Split(input, pattern) ' Split on hyphens.
For Each match As String In substrings
Console.WriteLine("'{0}'", match)
Next
End Sub
End Module
' The method writes the following to the console:
' 'plum'
' '-'
' 'pear'
So reading this and this. The RegExr Website was used with the expression /([0-9]{1,2})([.]{0,1})([0-9]{0,2})/igm on the Input.
And the following is obtained:
Is there a better way to make this? Is the Regex Correct or a better way to generate? The examples that i found on google didn't enlight me on how to use RegEx with Split correctly.
Maybe I am confusing with the logic of Split Function, which i wanted to get the split index and the separator string was the regex.
I can make that it ends with word and period
Use
\d+(?:\.\d+)*[\s\S]*?\w+\.
See the regex demo.
Details
\d+ - 1 or more digits
(?:\.\d+)* - zero or more sequences of:
\. - dot
\d+ - 1 or more digits
[\s\S]*? - any 0+ chars, as few as possible, up to the first...
\w+\. - 1+ word chars followed with ..
Here is a sample VBA code:
Dim str As String
Dim objMatches As Object
str = " 1 One without dot. 2. Some Random String. 3.1 With SubItens. 3.2 With Another SubItem. 4. List item. 11.12 More than one digit."
Set objRegExp = New regexp ' CreateObject("VBScript.RegExp")
objRegExp.Pattern = "\d+(?:\.\d+)*[\s\S]*?\w+\."
objRegExp.Global = True
Set objMatches = objRegExp.Execute(str)
If objMatches.Count <> 0 Then
For Each m In objMatches
Debug.Print m.Value
Next
End If
NOTE
You may require the matches to only stop at the word + . that are followed with 0+ whitespaces and a number using \d+(?:\.\d+)*[\s\S]*?[a-zA-Z]+\.(?=\s*(?:\d+|$)).
The (?=\s*(?:\d+|$)) positive lookahead requires the presence of 0+ whitespaces (\s*) followed with 1+ digits (\d+) or end of string ($) immediately to the right of the current location.
If VBA's split supports look-behind regex then this one may work, assuming there's no digit except in the indexes:
\s(?=\d)

RegEx to extract a word from mail's body

I need to extract a word from incoming mail's body.
I used a Regex after referring to sites but it is not giving any result nor it is throwing an error.
Example: Description: sample text
I want only the first word after the colon.
Dim reg1 As RegExp
Dim M1 As MatchCollection
Dim M As Match
Dim EAI As String
Set reg1 = New RegExp
With reg1
.Pattern = "Description\s*[:]+\s*(\w*)\s*"
.Global = False
End With
If reg1.Test(Item.Body) Then
Set M1 = reg1.Execute(Item.Body)
For Each M In M1
EAI = M.SubMatches(1)
Next
End If
Note that your pattern works well though it is better written as:
Description\s*:+\s*(\w+)
And it will match Description, then 0+ whitespaces, 1+ : symbols, again 0 or more whitespaces and then will capture into Group 1 one or more word characters (as letters, digits or _ symbols).
Now, the Capture Group 1 value is stored in M.SubMatches(0). Besides, you need not run .Test() because if there are no matches, you do not need to iterate over them. You actually want to get a single match.
Thus, just use
Set M1 = reg1.Execute(Item.body)
If M1.Count > 0 Then
EAI = M1(0).SubMatches(0)
End If
Where M1(0) is the first match and .SubMatches(0) is the text residing in the first group.

Can I store the captured groupe strings/digits on a regex search?

I want to store the captured group contents in a regex search to variables
Dim input As String ="asdfd sdf dsf fdsf <disp-formula id=""deqn1-3""> fdsf fds df"
Dim regex As Regex = New Regex("<disp-formula id=""deqn(\d+)-(\d+)"">")
Dim match As Match = regex.Match(input)
If match.Success Then
\\ put the values represented by (\d+) and (\d+) in two variables and then use them in a loop
Can that be done in vb.net? If so how?
Simply use the Groups property of match
Dim g1 = match.Groups(1).Value ' 1 in your sample
Dim g2 = match.Groups(2).Value ' 3 in your sample