Regex for Uppercase Letters, Numbers and dashes only - regex

I have struggled with this expression for 2 days now so I thought I'd ask for some proper help from the world of knowledge. I hope someone can help.
This is the RegEx I built to get me what I want.
\S*\d*?-[A-Z]*[0-9]*
I only want the Uppercase Letters and Numbers with dashes, so it does get GC-113, AO-1-GC-113, AO-2-GC-113, which is great!
"I don't want this ------, but this is good GC-113, AO-1-GC-113, AO-2-GC-113"
BUT if I come across one where there is no space between the number, but just another character like a comma or a period then it returns a match on the entire section "GC-113,AO-1-GC-113,AO-2-GC-113"
"I don't want this ------, but this is good GC-113,AO-1-GC-113,AO-2-GC-113"
I'm using RegExBuddy to try and figure this out.
This is the VBA code I'm using get the matches.
Public Function GetRIs(ByVal vstrInString As String) As Collection
Dim myRegExp As RegExp
Dim myMatches As Variant
Dim myMatch As Variant
Set GetRIs = New Collection
Set myRegExp = New RegExp
myRegExp.Global = True
myRegExp.Pattern = "\S*\d*?-[A-Z]*[0-9]*"
Set myMatches = myRegExp.Execute(vstrInString)
For Each myMatch In myMatches
If myMatch.Value <> "" Then
GetRIs.Add myMatch.Value
End If
Next
End Function
Thanks!
Dave

Your \S*\d*?-[A-Z]*[0-9]* pattern can even match a single hyphen as only - is obligatory and the rest of subpatterns can match zero times (can be absent from the string).
You can use
myRegExp.Pattern = "\b[A-Z0-9]+(?:-[A-Z0-9]+)+"
The pattern matches:
\b - a word boundary (before the next letter or digit there must be a non-word character or start of string
[A-Z0-9]+ - one or more letters or digits
(?:-[A-Z0-9]+)+ - 1 or more sequences of:
- - a hyphen
[A-Z0-9]+ - one or more letters or digits

Related

Use Regex to Split Numbered List array into Numbered List Multiline

I am trying to learn Regex to answer a question on SO portuguese.
Input (Array or String on a Cell, so .MultiLine = False)?
1 One without dot. 2. Some Random String. 3.1 With SubItens. 3.2 With number 0n mid. 4. Number 9 incorrect. 11.12 More than one digit. 12.7 Ending (no word).
Output
1 One without dot.
2. Some Random String.
3.1 With SubItens.
3.2 With number 0n mid.
4. Number 9 incorrect.
11.12 More than one digit.
12.7 Ending (no word).
What i thought was to use Regex with Split, but i wasn't able to implement the example on Excel.
Imports System.Text.RegularExpressions
Module Example
Public Sub Main()
Dim input As String = "plum-pear"
Dim pattern As String = "(-)"
Dim substrings() As String = Regex.Split(input, pattern) ' Split on hyphens.
For Each match As String In substrings
Console.WriteLine("'{0}'", match)
Next
End Sub
End Module
' The method writes the following to the console:
' 'plum'
' '-'
' 'pear'
So reading this and this. The RegExr Website was used with the expression /([0-9]{1,2})([.]{0,1})([0-9]{0,2})/igm on the Input.
And the following is obtained:
Is there a better way to make this? Is the Regex Correct or a better way to generate? The examples that i found on google didn't enlight me on how to use RegEx with Split correctly.
Maybe I am confusing with the logic of Split Function, which i wanted to get the split index and the separator string was the regex.
I can make that it ends with word and period
Use
\d+(?:\.\d+)*[\s\S]*?\w+\.
See the regex demo.
Details
\d+ - 1 or more digits
(?:\.\d+)* - zero or more sequences of:
\. - dot
\d+ - 1 or more digits
[\s\S]*? - any 0+ chars, as few as possible, up to the first...
\w+\. - 1+ word chars followed with ..
Here is a sample VBA code:
Dim str As String
Dim objMatches As Object
str = " 1 One without dot. 2. Some Random String. 3.1 With SubItens. 3.2 With Another SubItem. 4. List item. 11.12 More than one digit."
Set objRegExp = New regexp ' CreateObject("VBScript.RegExp")
objRegExp.Pattern = "\d+(?:\.\d+)*[\s\S]*?\w+\."
objRegExp.Global = True
Set objMatches = objRegExp.Execute(str)
If objMatches.Count <> 0 Then
For Each m In objMatches
Debug.Print m.Value
Next
End If
NOTE
You may require the matches to only stop at the word + . that are followed with 0+ whitespaces and a number using \d+(?:\.\d+)*[\s\S]*?[a-zA-Z]+\.(?=\s*(?:\d+|$)).
The (?=\s*(?:\d+|$)) positive lookahead requires the presence of 0+ whitespaces (\s*) followed with 1+ digits (\d+) or end of string ($) immediately to the right of the current location.
If VBA's split supports look-behind regex then this one may work, assuming there's no digit except in the indexes:
\s(?=\d)

Validating a string's first 3 letters as uppercase with regex

I have a question on Classic ASP regarding validating a string's first 3 letters to be uppercase while the last 4 characters should be in numerical form using regex.
For e.g.:
dim myString = "abc1234"
How do I validate that it should be "ABC1234" instead of "abc1234"?
Apologies for my broken English and for being a newbie in Classic ASP.
#ndn has a good regex pattern for you. To apply it in Classic ASP, you just need to create a RegExp object that uses the pattern and then call the Test() function to test your string against the pattern.
For example:
Dim re
Set re = New RegExp
re.Pattern = "^[A-Z]{3}.*[0-9]{4}$" ' #ndn's pattern
If re.Test(myString) Then
' Match. First three characters are uppercase letters and last four are digits.
Else
' No match.
End If
^[A-Z]{3}.*[0-9]{4}$
Explanation:
Surround everything with ^$ (start and end of string) to ensure you are matching everything
[A-Z] - gives you all capital letters in the English alphabet
{3} - three of those
.* - optionally, there can be something in between (if there can't be, you can just remove this)
[0-9] - any digit
{4} - 4 of those

VB.NET regex searching for AAA-9999

I need help with finding the first 3 capital letters A-Z and then a space followed by 4 numbers 0-9.
Dim IndividualClasses As MatchCollection = Regex.Matches(AllExitClasses(a), "([A-Z])([A-Z])([A-Z]) ([0-9])([0-9])([0-9])([0-9])")
An example input string would be AML 4309 or DEF 4298.
The above 7 characters are what I want to get out of string.
EDIT: Since you preprocess your input string, you can use
Dim IndividualClasses As MatchCollection = Regex.Matches(AllExitClasses(a).Replace(" ", "-"), "[A-Z]{3}[-][0-9]{4}")
REGEX EXPLANATION:
[A-Z]{3} - 3 occurrences of English letters A to Z
[-] - A character class matching exactly one hyphen
[0-9]{4} - Exactly 4 occurrences of digits from 0 to 9.
Note that I removed capturing groups since you do not seem to be using them at all, and I am using limiting quantifiers, e.g. {4}.
Note that you could use your input string as is and previous regex [A-Z]{3}\p{Zs}[0-9]{4}, but you would need to iterate through the match collection and replace a space in each Match.Value with a hyphen creating a new array.
Here is an IDEONE demo
Ok I replaced the spaces with a dash
then I am using this Regular expression
"([A-Z])([A-Z])([A-Z])([-])([0-9])([0-9])([0-9])([0-9])")
which works
AllExitClasses(a) = AllExitClasses(a).Replace(" ", "-")
'
MyClassString = AllExitClasses(a).ToString
Dim IndividualClasses As MatchCollection = Regex.Matches(MyClassString, "([A-Z])([A-Z])([A-Z])([-])([0-9])([0-9])([0-9])([0-9])")
Regex.Matches([variable], "^([A-Z]{3,3})(\s)([0-9]{4,4})$")
This regex will find your AAA 1111 (3 uppercase letters with [A-Z]{3,3}; one white space with (\s); and exactly 4 digits with ([0-9]{4})). I have found that http://regex101.com helps a lot with expressions in different languages.

Regular expression in vb.net

how to check particular value start with string or digit. here i attached my code. am getting error to like idendifier expected.
code
----
Dim i As String
dim ReturnValue as boolean
i = 400087
Dim s_str As String = i.Substring(0, 1)
Dim regex As Regex = New Regex([(a - z)(A-Z)])
ReturnValue = Regex.IsMatch(s_str, Regex)
error
regx is type and cant be used as an expression
Your variable is regex, Regex is the type of the variable.
So it is:
ReturnValue = Regex.IsMatch(s_str, regex)
But your regex is also flawed. [(a - z)(A-Z)] is creating a character class that does exactly match the characters ()-az, the range A-Z and a space and nothing else.
It looks to me as if you want to match letters. For that just use \p{L} that is a Unicode property that would match any character that is a letter in any language.
Dim regex As Regex = New Regex("[\p{L}\d]")
maybe you mean
Dim _regex As Regex = New Regex("[(a-z)(A-Z)]")
Dim regex As Regex = New Regex([(a - z)(A-Z)])
ReturnValue = Regex.IsMatch(s_str, Regex)
Note case difference, use regex.IsMatch. You also need to quote the regex string: "[(a - z)(A-Z)]".
Finally, that regex doesn't make sense, you are matching any letter or opening/closing parenthesis anywhere in the string.
To match at the start of the string you need to include the start anchor ^, something like: ^[a-zA-Z] matches any ASCII letter at the start of the string.
Check if a string starts with a letter or digit:
ReturnValue = Regex.IsMatch(s_str,"^[a-zA-Z0-9]+")
Regex Explanation:
^ # Matches start of string
[a-zA-Z0-9] # Followed by any letter or number
+ # at least one letter of number
See it in action here.

Recognize numbers in french format inside document using regex

I have a document containing numbers in various formats, french, english, custom formats.
I wanted a regex that could catch ONLY numbers in french format.
This is a complete list of numbers I want to catch (d represents a digit, decimal separator is comma , and thousands separator is space)
d,d d,dd d,ddd
dd,d dd,dd dd,ddd
ddd,d ddd,dd ddd,ddd
d ddd,d d ddd,dd d ddd,ddd
dd ddd,d dd ddd,dd dd ddd,ddd
ddd ddd,d ddd ddd,dd ddd ddd,ddd
d ddd ddd,d...
dd ddd ddd,d...
ddd ddd ddd,d...
This is the regex I have
(\d{1,3}\s(\d{3}\s)*\d{3}(\,\d{1,3})?|\d{1,3}\,\d{1,3})
catches french formats like above, so I am on the right track, but also numbers like d,ddd.dd (because it catches d,ddd) or d,ddd,ddd (because it catches d,ddd ).
What should I add to my regex ?
The VBA code I have:
Sub ChangeNumberFromFRformatToENformat()
Dim SectionText As String
Dim RegEx As Object, RegC As Object, RegM As Object
Dim i As Integer
Set RegEx = CreateObject("vbscript.regexp")
With RegEx
.Global = True
.MultiLine = False
.Pattern = "(\d{1,3}\s(\d{3}\s)*\d{3}(\,\d{1,3})?|\d{1,3}\,\d{1,3})"
' regular expression used for the macro to recognise FR formated numners
End With
For i = 1 To ActiveDocument.Sections.Count()
SectionText = ActiveDocument.Sections(i).Range.Text
If RegEx.test(SectionText) Then
Set RegC = RegEx.Execute(SectionText)
' RegC regular expresion matches collection, holding french format numbers
For Each RegM In RegC
Call ChangeThousandAndDecimalSeparator(RegM.Value)
Next 'For Each RegM In RegC
Set RegC = Nothing
Set RegM = Nothing
End If
Next 'For i = 6 To ActiveDocument.Sections.Count()
Set RegEx = Nothing
End Sub
The user stema, gave me a nice solution. The regex should be:
(?<=^|\s)\d{1,3}(?:\s\d{3})*(?:\,\d{1,3})?(?=\s|$)
But VBA complains that the regexp has unescaped characters. I have found one here (?: \d{3}) between (?: \d{3}) which is a blank character, so I can substitute that with \s. The second one I think is here (?:,\d{1,3}) between ?: and \d, the comma character, and if I escape it will be \, .
So the regex is now (?<=^|\s)\d{1,3}(?:\s\d{3})*(?:\,\d{1,3})?(?=\s|$) and it works fine in RegExr but my VBA code will not accept it.
NEW LINE IN POST :
I have just discovered that VBA doesn't agree with this sequence of the regex ?<=^
What about this?
\b\d{1,3}(?: \d{3})*(?:,\d{1,3})?\b
See it here on Regexr
\b are word boundaries
At first (\d{1,3}) match 1 to 3 digits, then there can be 0 or more groups of a leading space followed by 3 digits ((?: \d{3})*) and at last there can be an optional fraction part ((?:,\d{1,3})?)
Edit:
if you want to avoid 1,111.1 then the \b anchors are not good for you. Try this:
(?<=^|\s)\d{1,3}(?: \d{3})*(?:,\d{1,3})?(?=\s|$)
Regexr
This regex requires now a whitespace or the start of the string before and a whitespace or the end of the string after the number to match.
Edit 2:
Since look behinds are not supported you can change to
(?:^|\s)\d{1,3}(?: \d{3})*(?:,\d{1,3})?(?=\s|$)
This changes nothing at the start of the string, but if the number starts with a leading whitespace, this is now included in the match. If the result of the match is used for something at first the leading whitespace has to be stripped (I am quite sure VBA does have a methond for that (try trim())).
If you are reading on a line by line basis, you might consider adding anchors (^ and $) to your regex, so you will end up with something like so:
^(\d{1,3}\s(\d{3}\s)*\d{3}(\,\d{1,3})?|\d{1,3}\,\d{1,3})$
This instructs the RegEx engine to start matching from the beginning of the line till the very end.