Regular Expression writing for known pattern with dashes - regex

I need to write a regular expression for pattern matching for VB.NET. I need to have the Regex to look for a pattern like 12345-1234-12345-123, including the dashes. The numbers can be any variation. The value is stored as a varchar. Not sure how close or far my example is below. Any help/guidance is much appreciated.
Protected Sub Button1_Click(sender As Object, e As System.EventArgs) Handles Button1.Click
Dim testString As String = "12345-1234-12345-123"
Dim testNumberWithDashesRegEx As Regex = New Regex("^\d{5}-d{4}-d{5}-\d{3}$")
Dim regExMatch As Match = testNumberWithDashesRegEx.Match(testString)
If regExMatch.Success Then
Label1.Text = "There is a match."
Else
Label1.Text = "There is no match."
End If
End Sub

Let's break down this regex:
^\d{5}-d{4}-d{5}-\d{3}$
^: Match at start of target string
\d: match character class of digits 0-9
-: match dash (-) character
d: match the letter "d"
{5}: match the preceding class 5 times
$: Match at the end of target string.
Everything looks good to me, except you should change your plain "d" to "\d":
^\d{5}-\d{4}-\d{5}-\d{3}$

Related

Get the non-matching part of the pattern through a RegEx

In this topic, the idea is to take "strip" the numerics, divided by a x through a RegEx. -> How to extract ad sizes from a string with excel regex
Thus from:
uni3uios3_300x250_ASDF.html
I want to achieve through RegEx:
300x250
I have managed to achieve the exact opposite and I am struggling some time to get what needs to be done.
This is what I have until now:
Public Function regExSampler(s As String) As String
Dim regEx As Object
Dim inputMatches As Object
Dim regExString As String
Set regEx = CreateObject("VBScript.RegExp")
With regEx
.Pattern = "(([0-9]+)x([0-9]+))"
.IgnoreCase = True
.Global = True
Set inputMatches = .Execute(s)
If regEx.test(s) Then
regExSampler = .Replace(s, vbNullString)
Else
regExSampler = s
End If
End With
End Function
Public Sub TestMe()
Debug.Print regExSampler("uni3uios3_300x250_ASDF.html")
Debug.Print regExSampler("uni3uios3_34300x25_ASDF.html")
Debug.Print regExSampler("uni3uios3_8x4_ASDF.html")
End Sub
If you run TestMe, you would get:
uni3uios3__ASDF.html
uni3uios3__ASDF.html
uni3uios3__ASDF.html
And this is exactly what I want to strip through RegEx.
Change the IF block to
If regEx.test(s) Then
regExSampler = InputMatches(0)
Else
regExSampler = s
End If
And your results will return
300x250
34300x25
8x4
This is because InputMatches holds the results of the RegEx execution, which holds the pattern you were matching against.
As requested by the OP, I'm posting this as an answer:
Solution:
^.*\D(?=\d+x\d+)|\D+$
Demonstration: regex101.com
Explanation:
^.*\D - Here we're matching every character from the start of the string until it reaches a non-digit (\D) character.
(?=\d+x\d+) - This is a positive lookahead. It means that the previous pattern (^.*\D) should only match if followed by the pattern described inside it (\d+x\d+). The lookahead itself doesn't capture any character, so the pattern \d+x\d+ isn't captured by the regex.
\d+x\d+ - This one should be easy to understand because it's equivalent to [0-9]+x[0-9]+. As you see, \d is a token that represents any digit character.
\D+$ - This pattern matches one or more non-digit characters until it reaches the end of the string.
Finally, both patterns are linked by an OR condition (|) so that the whole regex matches one pattern or another.

Regex for Uppercase Letters, Numbers and dashes only

I have struggled with this expression for 2 days now so I thought I'd ask for some proper help from the world of knowledge. I hope someone can help.
This is the RegEx I built to get me what I want.
\S*\d*?-[A-Z]*[0-9]*
I only want the Uppercase Letters and Numbers with dashes, so it does get GC-113, AO-1-GC-113, AO-2-GC-113, which is great!
"I don't want this ------, but this is good GC-113, AO-1-GC-113, AO-2-GC-113"
BUT if I come across one where there is no space between the number, but just another character like a comma or a period then it returns a match on the entire section "GC-113,AO-1-GC-113,AO-2-GC-113"
"I don't want this ------, but this is good GC-113,AO-1-GC-113,AO-2-GC-113"
I'm using RegExBuddy to try and figure this out.
This is the VBA code I'm using get the matches.
Public Function GetRIs(ByVal vstrInString As String) As Collection
Dim myRegExp As RegExp
Dim myMatches As Variant
Dim myMatch As Variant
Set GetRIs = New Collection
Set myRegExp = New RegExp
myRegExp.Global = True
myRegExp.Pattern = "\S*\d*?-[A-Z]*[0-9]*"
Set myMatches = myRegExp.Execute(vstrInString)
For Each myMatch In myMatches
If myMatch.Value <> "" Then
GetRIs.Add myMatch.Value
End If
Next
End Function
Thanks!
Dave
Your \S*\d*?-[A-Z]*[0-9]* pattern can even match a single hyphen as only - is obligatory and the rest of subpatterns can match zero times (can be absent from the string).
You can use
myRegExp.Pattern = "\b[A-Z0-9]+(?:-[A-Z0-9]+)+"
The pattern matches:
\b - a word boundary (before the next letter or digit there must be a non-word character or start of string
[A-Z0-9]+ - one or more letters or digits
(?:-[A-Z0-9]+)+ - 1 or more sequences of:
- - a hyphen
[A-Z0-9]+ - one or more letters or digits

How to match the particular part from the nth index of the specific character?

I have the input data as,
"Thumbnail":"/images/7.0.2.5076_1/spacer.gif","URL":"http://id800/home/LayoutManager/l1.html/1407462681_292_2_2_1398567201/"
And I want to match the l1.html part of it. It can be anything. So I want to match the Part of URL which occurs before the second last occurrence of the / and after the third last occurrence of the /. That part either the number, alphanumeric, or the alphnumeric with .html extension. so besically I want to match the part between the 3rd and 2nd / from end. I tried lots of combinations but I was unable to come up with. Any help would be great.
Pattern:
\".+?(\w+\.\w{3,5})\/.+?\"
\" will match starting and ending quote
.+? will match any number of characters
\w+ will match any number of words
\. will match .(dot)
\w{3,5} will match any word which are 3-5 characters long
\/ will match /(forward slash)
() these parenthesis capture in separate group
Code in action:
string pattern = "\".+?(\\w+\\.\\w{3,5})\\/.+?\"";
string text = "\"Thumbnail\":\"/images/7.0.2.5076_1/spacer.gif\",\"URL\":\"http://id800/home/LayoutManager/l1.html/1407462681_292_2_2_1398567201/\"";
MatchCollection matches = Regex.Matches(text, pattern);
if (matches != null && matches[0].Groups != null)
{
string value = matches[0].Groups[1].Value; //Output: l1.html
}
You have not provided the whole JSON string, but I think my snippet will help you get what you want anyway without regex. Add a reference to System.Web.Extensions, and use the following code:
Dim s As String = "[{""Thumbnail"":""/images/7.0.2.5076_1/spacer.gif"",""URL"":""http://id800/home/LayoutManager/l1.html/1407462681_292_2_2_1398567201/""}]" ' "[{""application_id"":""1"",""application_package"":""abc""},{""application_id"":""2"",""application_package"":""xyz""}]"
Dim jss As New System.Web.Script.Serialization.JavaScriptSerializer()
Dim dict = jss.Deserialize(Of List(Of Object))(s)
For Each d In dict
For Each v In d
If v.Key = "URL" Then
Dim tmp = v.Value.Trim("/"c).ToString().Split("/"c)
MsgBox(tmp(tmp.Length - 2))
End If
Next
Next
Result:
The substring you need can be obtained without a regex by mere splitting the value with /, and accessing the last but one element.

Regular expression in vb.net

how to check particular value start with string or digit. here i attached my code. am getting error to like idendifier expected.
code
----
Dim i As String
dim ReturnValue as boolean
i = 400087
Dim s_str As String = i.Substring(0, 1)
Dim regex As Regex = New Regex([(a - z)(A-Z)])
ReturnValue = Regex.IsMatch(s_str, Regex)
error
regx is type and cant be used as an expression
Your variable is regex, Regex is the type of the variable.
So it is:
ReturnValue = Regex.IsMatch(s_str, regex)
But your regex is also flawed. [(a - z)(A-Z)] is creating a character class that does exactly match the characters ()-az, the range A-Z and a space and nothing else.
It looks to me as if you want to match letters. For that just use \p{L} that is a Unicode property that would match any character that is a letter in any language.
Dim regex As Regex = New Regex("[\p{L}\d]")
maybe you mean
Dim _regex As Regex = New Regex("[(a-z)(A-Z)]")
Dim regex As Regex = New Regex([(a - z)(A-Z)])
ReturnValue = Regex.IsMatch(s_str, Regex)
Note case difference, use regex.IsMatch. You also need to quote the regex string: "[(a - z)(A-Z)]".
Finally, that regex doesn't make sense, you are matching any letter or opening/closing parenthesis anywhere in the string.
To match at the start of the string you need to include the start anchor ^, something like: ^[a-zA-Z] matches any ASCII letter at the start of the string.
Check if a string starts with a letter or digit:
ReturnValue = Regex.IsMatch(s_str,"^[a-zA-Z0-9]+")
Regex Explanation:
^ # Matches start of string
[a-zA-Z0-9] # Followed by any letter or number
+ # at least one letter of number
See it in action here.

Recognize numbers in french format inside document using regex

I have a document containing numbers in various formats, french, english, custom formats.
I wanted a regex that could catch ONLY numbers in french format.
This is a complete list of numbers I want to catch (d represents a digit, decimal separator is comma , and thousands separator is space)
d,d d,dd d,ddd
dd,d dd,dd dd,ddd
ddd,d ddd,dd ddd,ddd
d ddd,d d ddd,dd d ddd,ddd
dd ddd,d dd ddd,dd dd ddd,ddd
ddd ddd,d ddd ddd,dd ddd ddd,ddd
d ddd ddd,d...
dd ddd ddd,d...
ddd ddd ddd,d...
This is the regex I have
(\d{1,3}\s(\d{3}\s)*\d{3}(\,\d{1,3})?|\d{1,3}\,\d{1,3})
catches french formats like above, so I am on the right track, but also numbers like d,ddd.dd (because it catches d,ddd) or d,ddd,ddd (because it catches d,ddd ).
What should I add to my regex ?
The VBA code I have:
Sub ChangeNumberFromFRformatToENformat()
Dim SectionText As String
Dim RegEx As Object, RegC As Object, RegM As Object
Dim i As Integer
Set RegEx = CreateObject("vbscript.regexp")
With RegEx
.Global = True
.MultiLine = False
.Pattern = "(\d{1,3}\s(\d{3}\s)*\d{3}(\,\d{1,3})?|\d{1,3}\,\d{1,3})"
' regular expression used for the macro to recognise FR formated numners
End With
For i = 1 To ActiveDocument.Sections.Count()
SectionText = ActiveDocument.Sections(i).Range.Text
If RegEx.test(SectionText) Then
Set RegC = RegEx.Execute(SectionText)
' RegC regular expresion matches collection, holding french format numbers
For Each RegM In RegC
Call ChangeThousandAndDecimalSeparator(RegM.Value)
Next 'For Each RegM In RegC
Set RegC = Nothing
Set RegM = Nothing
End If
Next 'For i = 6 To ActiveDocument.Sections.Count()
Set RegEx = Nothing
End Sub
The user stema, gave me a nice solution. The regex should be:
(?<=^|\s)\d{1,3}(?:\s\d{3})*(?:\,\d{1,3})?(?=\s|$)
But VBA complains that the regexp has unescaped characters. I have found one here (?: \d{3}) between (?: \d{3}) which is a blank character, so I can substitute that with \s. The second one I think is here (?:,\d{1,3}) between ?: and \d, the comma character, and if I escape it will be \, .
So the regex is now (?<=^|\s)\d{1,3}(?:\s\d{3})*(?:\,\d{1,3})?(?=\s|$) and it works fine in RegExr but my VBA code will not accept it.
NEW LINE IN POST :
I have just discovered that VBA doesn't agree with this sequence of the regex ?<=^
What about this?
\b\d{1,3}(?: \d{3})*(?:,\d{1,3})?\b
See it here on Regexr
\b are word boundaries
At first (\d{1,3}) match 1 to 3 digits, then there can be 0 or more groups of a leading space followed by 3 digits ((?: \d{3})*) and at last there can be an optional fraction part ((?:,\d{1,3})?)
Edit:
if you want to avoid 1,111.1 then the \b anchors are not good for you. Try this:
(?<=^|\s)\d{1,3}(?: \d{3})*(?:,\d{1,3})?(?=\s|$)
Regexr
This regex requires now a whitespace or the start of the string before and a whitespace or the end of the string after the number to match.
Edit 2:
Since look behinds are not supported you can change to
(?:^|\s)\d{1,3}(?: \d{3})*(?:,\d{1,3})?(?=\s|$)
This changes nothing at the start of the string, but if the number starts with a leading whitespace, this is now included in the match. If the result of the match is used for something at first the leading whitespace has to be stripped (I am quite sure VBA does have a methond for that (try trim())).
If you are reading on a line by line basis, you might consider adding anchors (^ and $) to your regex, so you will end up with something like so:
^(\d{1,3}\s(\d{3}\s)*\d{3}(\,\d{1,3})?|\d{1,3}\,\d{1,3})$
This instructs the RegEx engine to start matching from the beginning of the line till the very end.