Extract an ID with regex - regex

I am writing a program that gets lines of input in the following format:
Firstname, Lastname, ID number, contact info
I want to use regex to just grab the ID number which is formatted like A######## where # can be any number.
I have googled and am having trouble understanding VB's regex patterns, can anyone help me out?

In general, there is no such thing as "VB's regex". Both VB.NET and C# uses the same .NET regular expression syntax:
In the .NET Framework, regular expression patterns are defined by a special syntax or language, which is compatible with Perl 5 regular expressions and adds some additional features such as right-to-left matching.
Nevertheless, your Regex should be:
A\d{8}
Which means: Match A, then match any digit (\d) exactly eight times.
Practical VB.NET usage:
Dim input As String = "Firstname, Lastname, A12345678, contact info"
Dim id As String = Regex.Match(input, "A\d{8}").Value

You can simply use:
\bA\d+\b
In context:
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim regex As Regex = New Regex("\bA\d+\b")
Dim match As Match = regex.Match("Firstname, Lastname, A123456, Other stuff...")
If match.Success Then
Console.WriteLine(match.Value)
End If
End Sub
End Module
Working example: http://regex101.com/r/pB0pR5

Related

Regular expression formatting issue

I'm VERY new to using regular expressions, and I'm trying to figure something simple out.
I have a simple string, and i'm trying to pull out the 590111 and place it into another string.
HMax_590111-1_v8980.bin
So the new string would simply be...
590111
The part number will ALWAYS have 6 digits, and ALWAYS have a version and such. The part number might change location inside of the string.. so it needs to be able to work if it's like this..
590111-1_v8980_HMXAX.bin
What regex expression will do this? Currently, i'm using ^[0-9]* to find it if it's in the front of the file.
Try the following Regex:
Dim text As String = "590111-1_v8980_HMXAX.bin"
Dim pattern As String = "\d{6}"
'Instantiate the regular expression object.
Dim r As Regex = new Regex(pattern, RegexOptions.IgnoreCase)
'Match the regular expression pattern against a text string.
Dim m As Match = r.Match(text)
In Regex \d denotes numerics, so first you write \d.
Then as you know there will be a fix length of numbers which can be specified in Regex with "{}". If you specify \d{6} it means it will expect 6 continuous occurrences of a numeric character.
I would recommend to use this site to try your own expressions. Here you can also find a little bit of information about the expressions you are building if you hover over it.
Regex Tester

Regex match a specific string

I am trying to extract the string <Num> from within Barcode(_<Num>_).PDF using Regex. I am looking at Regular Expression Language - Quick Reference but it is not easy. Thanks for any help.
Dim pattern As String = "^Barcode(_+_)\.pdf"
If Regex.IsMatch("Barcode(_abc123_).pdf", pattern) Then
Debug.Print("match")
End If
If you are trying to not only match but also READ the value of into a variable, then you will need to call the Regex.Match method instead of simply calling the boolean isMatch method. The Match method will return a Match object that will let you get to the groups and captures from your pattern.
Your pattern would need be something like "Barcode\(_(.*)_\)\.pdf"-- note the inner parenthesis which will create a capture group for you to obtain the value of the string between the underscores.. See a MSDN docs for examples of almost exactly what you are doing.
I don't know the regex in VB, but I can offer you a website to examine the correctness of your regex: Regex Tester. In this case, if the <Num> is numbers, you can use "Barcode(_\d+_).pdf"
Just for the record, this is what I ended up using:
'set up regex
'I'm using + instead of * in the pattern to ensure that if no value is
'present the match will fail
Dim pattern As String = "Barcode\(_(.+)_\)\.pdf"
Dim r As Regex = New Regex(pattern, RegexOptions.IgnoreCase)
'get match
Dim mat As Match
mat = r.Match("Barcode(_abc123_).pdf")
'output the matched string
If mat.Success Then
Dim g As Group = mat.Groups(1)
Dim cc As CaptureCollection = g.Captures
Dim c As Capture = cc(0)
Debug.Print(c.ToString)
End If
.NET Framework Regular Expressions

VB.Net REGEX to strip email

I have a need to strip email addresses out of paragraphs of plain text. I have googled and search this site and found many suggestions - none of which I can get to work. I'm using code like this:
Imports System.Text.RegularExpressions
Dim strEmailPattern As String = "^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$"
Dim senText As String = "blah blah blah blah blah someone#somewhere.com"
Dim newText As String = String.Empty
newText = Regex.Replace(senText, strEmailPattern, String.Empty)
After the call to Regex.Replace the newText string still contains the complete senText string including the email. I thought it was the regex pattern I was using but I have tried many so maybe I'm am missing something in the code?
This posix regex should match all the emails, provided
they may not be valid
every email contains at least on #
there are sequences of characters around # symbols which includes alphabet, digits, hyphen and dots and not started by any non-alpha characters.
All emails are separated by at least a single space char.
Regex
([[:alpha:]][[:alnum:].-]+#)+[[:alpha:]][[:alnum:].-]+
This might also work
([a-zA-Z][[a-zA-Z0-9].-]+#)+[a-zA-Z][a-zA-Z0-9.-]+
A shorter version (as in comment) would be
(\w[\w.-]+#)+\w[\w.-]+
But this will match some more invalid emails.
The patter I am addressing will match most of the email addresses. if you really want to match all the RFC-822 compliant emails, consider using the pattern here. Its a 6425 character long regex that matches all the standard email address. But be ware, it'll execute slow!
There are various corner cases where your regex would fail..
you should use as simple as this
(?<=^|\s)[^#]+?\#[^#]+?(?=$|\s)

RegEx pattern to extract URLs

I have to extract all there is between this caracters:
<a href="/url?q=(text to extract whatever it is)&amp
I tried this pattern, but it's not working for me:
/(?<=url\?q=).*?(?=&amp)/
I'm programming in Vb.net, this is the code, but I think that the problem is that the pattern is wrong:
Dim matches As MatchCollection
matches = regex.Matches(TextBox1.Text)
For Each Match As Match In matches
listbox1.items.add(Match.Value)
Next
Could you help me please?
Your regex is seemed to be correct except the slash(/) in the beginning and ending of expression, remove it:
Dim regex = New Regex("(?<=url\?q=).*?(?=&amp)")
and it should work.
Some utilities and most languages use / (forward slash) to start and end (de-limit or contain) the search expression others may use single quotes. With System.Text.RegularExpressions.Regex you don't need it.
This regex code below will extract all urls from your text (or any other):
(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?

RegEx to match text and enclosing braces

I need to loop through all the matches in say the following string:
<a href='/Product/Show/{ProductRowID}'>{ProductName}</a>
I am looking to capture the values in the {} including them, so I want {ProductRowID} and {ProductName}
Here is my code so far:
Dim r As Regex = New Regex("{\w*}", RegexOptions.IgnoreCase)
Dim m As Match = r.Match("<a href='/Product/Show/{ProductRowID}'>{ProductName}</a>")
Is my RegEx pattern correct? How do I loop through the matched values? I feel like this should be super easy but I have been stumped on this this morning!
Your Pattern is missing a small detail:
\{\w*?\}
Curly braces must be escaped, and you want the non-greedy star, or your first (and only) match will be this: "{ProductRowID}'>{ProductName}".
Dim r As Regex = New Regex("\{\w*?\}")
Dim input As String = "<a href='/Product/Show/{ProductRowID}'>{ProductName}</a>"
Dim mc As MatchCollection = Regex.Matches(input, r)
For Each m As Match In mc
MsgBox.Show(m.ToString())
Next m
RegexOptions.IgnoreCase is not needed, because this particular regex is not case sensitive anyway.
You can just group your matches using a regex like the following:
<a href='/Product/Show/(.+)'\>(.+)</a>
In this way you have $1 and $2 matching the values you want to get.
You an also give your matches names so that they aren't anonymous/position oriented for retrieval:
<a href='/Product/Show/(?<rowid>.+)'\>(?<name>.+)</a>
Change your RegEx pattern to \{\w*\} then it will match as you expect.
You can test it with an online .net RegEx tester.