Extract variables from pattern matching - regex

I'm not a match-pattern expert and I've been working on this for a few hours with no chance :/
I have an input string just like this:
Dim text As String = "32 Barcelona {GM C} 2 {*** Some ""cool"" text here}"
And I just want to extract 3 things:
Barcelona
GM C
*** Some "cool" text here
The pattern I'm trying is something like this:
Dim pattern As String = "^32\s(?<city>[^]].*\s)\{(?<titles>.*\})*"
Dim m As Match = Regex.Match(text, pattern)
If (m.Success) Then
Dim group1 As Group = m.Groups.Item("city")
Dim group2 As Group = m.Groups.Item("titles")
If group1.Success Then
MsgBox("City:" + group1.Value + ":", MsgBoxStyle.Information)
End If
If group2.Success Then
MsgBox(group2.Value, MsgBoxStyle.Information)
End If
Else
MsgBox("fail")
End If
But it's not working anyway :(
What should the pattern be to extract these 3 variables ?

^\d*(?<City>[A-Z a-z0-9]*)\s*\{(?<Titles>[A-Z a-z0-9]*)\}.*?\{(?<Cool>.*?)\}$
Seems to match your sample input.
Expresso is a great tool for designing regular expressions.

Related

Extracting Lines of data from a string with RegEx

I have several strings, e.g.
(3)_(9)--(11).(FT-2)
(10)--(20).(10)/test--(99)
I am trying Regex.Match(here I do no know) to get a list like this:
First sample:
3
_
9
--
11
.
FT-1
Second Sample:
10
--
20
.
10
/test--
99
So there are several numbers in brackets and any text between them.
Can anyone help me doing this in vb.net? A given string returns this list?
One option is to use the Split method of [String]
"(3)_(9)--(11).(FT-2)".Split('()')
Another option is to match everything excluding ( and )
As regex, this would do [^()]+
Breakdown
"[^()]" ' Match any single character NOT present in the list “()”
"+" ' Between one and unlimited times, as many times as possible, giving back as needed (greedy)
You can use following block of code to extract all matches
Try
Dim RegexObj As New Regex("[^()]+", RegexOptions.IgnoreCase)
Dim MatchResults As Match = RegexObj.Match(SubjectString)
While MatchResults.Success
' matched text: MatchResults.Value
' match start: MatchResults.Index
' match length: MatchResults.Length
MatchResults = MatchResults.NextMatch()
End While
Catch ex As ArgumentException
'Syntax error in the regular expression
End Try
This should work:
Dim input As String = "(3)_(9)--(11).(FT-2)"
Dim searchPattern As String = "\((?<keep>[^)]+)\)|(?<=\))(?<keep>[^()]+)"
Dim replacementPattern As String = "${keep}" + Environment.NewLine
Dim output As String = RegEx.Replace(input, searchPattern, replacementPattern)
The simplest way is to use Regex.Split (formulated as a little console test):
Dim input = {"(3)_(9)--(11).(FT-2)", "(10)--(20).(10)/test--(99)"}
For Each s As String In input
Dim parts = Regex.Split(s, "\(|\)")
Console.WriteLine($"Input = {s}")
For Each p As String In parts
Console.WriteLine(p)
Next
Next
Console.ReadKey()
So basically we have a one-liner for the regex part.
The regular expression \(|\) means: split at ( or ) where the braces are escaped with \ because of their special meaning within regex.
The slightly shorter regex [()] where the desired characters are enclosed in [] would produce the same result.

VBScript RegEx - match between words

I'm having a hard time coming up with a working RegEx that words in VBScript. I'm trying to match all text between 2 keywords:
(?<=key)(.*)(?=Id)
This throws a RegEx error in VBScript. Id
Blob I'm matching against:
\"key\":[\"food\",\"real\",\"versus\",\"giant\",\"giant gummy\",\"diy candy\",\"candy\",\"gummy worm\",\"pizza\",\"fries\",\"spooky diy science\",\"spooky\",\"trapped\"],\"Id\"
Ideally, I'd end up with a comma delimited list like this:
food,real,versus,giant,giant gummy,diy candy,candy,gummy worm,pizza,fries,spooky diy science,spooky,trapped
but, I'd settle for all text between 2 keywords working in VBScript.
Thanks in advance!
VBScript's regular expression engine doesn't support lookbehind assertions, so you'll want to do something like this instead:
s = "\""key\"":[\""food\"",\""real\"",\""trapped\""],\""Id\"""
'remove backslashes and double quotes from string
s1 = Replace(s, "\", "")
s1 = Replace(s1, Chr(34), "")
Set re = New RegExp
re.Pattern = "key:\[(.*?)\],Id"
For Each m In re.Execute(s1)
list = m.Submatches(0)
Next
WScript.Echo list

Using Server Side VB to parse textbox contents from date range

I have a date range being inputted into a textbox from a jquery ui daterange selector. I need to get the values on postback of the start date and end date. These values are provided in the textbox, but I'm ignorant on how to seperate out these values on postback with VB server side code. Can anyone show me how I can use vbscript to separate the start and end dates? The textbox results are exactly as follows:
{"start":"2017-04-12","end":"2017-05-17"}
I tried using the following code, but it does not work
Dim strDateStart as String
Dim strDateEnd as String
strDateStart = txtSearchDateRange.Text
strDateStart = Replace(strDateStart, "end*", "")
strDateEnd = txtSearchDateRange.Text
strDateEnd = Replace(strDateEnd, "start*", "")
Thanks to #Mederic, the following code works:
Dim value As String = txtSearchDateRange.Text
Dim strStartDate As String = ""
Dim strEndDate As String = ""
Dim i As Integer = 0
' Call Regex.Matches method.
Dim matches As MatchCollection = Regex.Matches(value, "\d{4}-\d{2}-\d{2}")
' Loop over matches.
For Each m As Match In matches
' Loop over captures.
For Each c As Capture In m.Captures
i = i + 1
' Display.
Console.WriteLine("Index={0}, Value={1}", c.Index, c.Value)
If i = 1 Then strStartDate = c.Value
If i = 2 Then strEndDate = c.Value
Next
Next
Response.Write("<BR><BR><BR><BR><BR><BR>Start Date:" & strStartDate & "<BR><BR>End Date:" & strEndDate)
Regex Approach:
A cleaner approach to the Regex using groups
First:
Imports System.Text.RegularExpressions
Then:
'Our regex
Dim regex As Regex = New Regex("(?<start>\d{4}-\d{2}-\d{2}).*(?<end>\d{4}-\d{2}-\d{2})")
'Match from textbox content
Dim match As Match = regex.Match(TextBox1.Text)
'If match is success
If match.Success Then
'Print start group
Console.WriteLine(match.Groups("start").Value)
'Print end group
Console.WriteLine(match.Groups("end").Value)
End If
Explanation of the Regex:
(?<start>REGEX) = Captures a group named start
(?<end>REGEX) = Captures a group named end
\d = Matches a digit
{X} = Matches for X occurences
.* = Makes sure we match zero or one example so not both groups are named start
Example:
\d{4} = Matches 4 digits
Json Approach
Json approach would be possible but a bit more complex I think to implement as you have a illegal name in your Json String: end
But if you wanted to use Json you could import Newtonsoft.Json
And have a class as:
Public Class Rootobject
Public Property start As String
Public Property _end As String
End Class
And then deserialize like this:
Dim obj = JsonConvert.DeserializeObject(Of Rootobject)(TextBox1.Text)
However you would need to implement: DataContract and DataMember
To handle the word end
DataContract MSDN

regex .NET to find and replace underscores only if found between > and <

I have a list of strings looking like this:
Title_in_Title_by_-_Mr._John_Doe
and I need to replace the _ with a SPACE from the text between the html"> and </a> ONLY.
so that the result to look like this:
Title in Title by - Mr. John Doe
I've tried to do it in 2 steps:
first isolate that part only with .*html">(.*)<\/a.* & ^.*>(.*)<.* & .*>.*<.* or ^.*>.*<.*
and then do the replace but the return is always unchanged and now I'm stuck.
Any help to accomplish this is much appreciated
How I would do it is to .split it and then .replace it, no need for regex.
Dim line as string = "Title_in_Title_by_-_Mr._John_Doe"
Dim split as string() = line.split(">"c)
Dim correctString as String = split(1).replace("_"c," "c)
Boom done
here is the string.replace article
Though if you had to use regex, this would probably be a better way of doing it
Dim inputString = "Title_in_Title_by_-_Mr._John_Doe"
Dim reg As New Regex("(?<=\>).*?(?=\<)")
Dim correctString = reg.match(inputString).value.replace("_"c, " "c)
Dim line as string = "Title_and_Title_by_-_Mr._John_Doe"
line = Regex.Replace(line, "(?<=\.html"">)[^<>]+(?=</a>)", _
Function (m) m.Value.Replace("_", " "))
This uses a regex with lookarounds to isolate the title, and a MatchEvaluator delegate in the form of a lambda expression to replace the underscores in the title, then it plugs the result back into the string.

using regex to split Combobox entry into two variables

I am looking for some instructions. I have a combobox that is being populated by a concatenation of two tables from a SQL database. Example of the text: "The Wild - 11/16/2014 2:00 AM". I am trying to have staA hold "The Wild" and staB "11/16/2014 2:00 AM". The length. I tried using a traditional string split on " - ", but this only return the first word. Next I tried a regex statement:
Dim input As String = strA
Dim pattern As String = "-"
Dim substring() As String = Regex.Split(input, pattern)
For Each match As String In substring
Console.WriteLine("'{0}'", match)
Next
but am not sure how to verify that the split happened or how to access the information from the split.
I prefer to use groups to do things like this with regex.
This way you can detect the absence of the expected pattern to deal with other inputs.
pattern = "^(?<a>[^-]+) - (?<b>[^-]+)$"
dim m as match=regex.match(input,pattern)
dim a as string=""
dim b as string=""
if m.success then
a=m.groups("a").value
b=m.groups("b").value
end if