vb.net identify keyword in simple lexical analyser - regex

I have a vb.net regex which I am using to identify operators in a simple z + x sum. How can I identify keywords in the given expression with the use of lexical analysis?
My current code:
Dim input As String = txtInput.Text
Dim symbol As String = "([-+*/])"
Dim substrings() As String = Regex.Split(input, symbol)
For Each match As String In substrings
lstOutput.Items.Add(match) '<-- Do I need to add a string here to identify the regular expression?
Next
input: z + x
This is what I want to happen in the output
z - keyword
+ - operator
x - keyword

Consider the following update to your code (as a Console project):
operators contains a string that you can include in your Regex pattern and also refer to later
in the loop, check if operators contains match meaning that the match is an operator
anything else is a keyword
So here's the code:
Dim input As String = "z+x"
Dim operators As String = "-+*/"
Dim pattern As String = "([" & operators & "])"
Dim substrings() As String = Regex.Split(input, pattern)
For Each match As String In substrings
If operators.Contains(match) Then
Console.WriteLine(match & " - operator")
Else
Console.WriteLine(match & " - keyword")
End if
Next

Related

Cast a substring catched by a regex to an integer and used it as an function argument in VB.Net

I've got a string such as :
Dim initialString As String = "Some text here is f(42,foo,bar) and maybe some other here."
And want to replace the "f(42,foo,bar)" part to the evaluation of a function with following prototype :
Function myLittleFunction(ByVal number As Integer, ByVal string0 As String = "NA0", ByVal string1 As String = "NA1")
Witch I did with this regex :
finalString = System.Text.RegularExpressions.Regex.Replace(initialString, "f\((\d+),([a-zA-Z0-9_ ]+),([a-zA-Z0-9_ ]+)\)", myLittleFunction(Convert.ToUInt32("${1}"), "$2", "$3"))
But that's not working because Convert.ToUInt32("${1}") fails. If a replace it by any integer by hand and run the code, I've got the correct evaluation and replacement in my string.
How can I correctly cast "$1" to appropriate integer ?
String replacement pattern cannot be interpolated for use as variables to a method.
You may use a match evaluator:
Dim rx = New Regex("f\((\d+),([a-zA-Z0-9_\s]+),([a-zA-Z0-9_\s]+)\)")
Dim result = rx.Replace(s, New MatchEvaluator(Function(m As Match)
Return myLittleFunction(Convert.ToUInt32(m.Groups(1).Value), m.Groups(2).Value, m.Groups(3).Value)
End Function))
The m is a Match object, the one that is found by the Regex.Replace method. You may access all the groups captured with the regex using m.Groups(N).Value.

.Net Regular Expression(Regex)

VB.NET separate strings using regex split?
Im having a logical error with the pattern string variable, the error occur after i extend the string from "(-)" to "(-)(+)(/)(*)"..
Dim input As String = txtInput.Text
Dim pattern As String = "(-)(+)(/)(*)"
Dim substrings() As String = Regex.Split(input, pattern)
For Each match As String In substrings
lstOutput.Items.Add(match)
This is my output when my pattern string variable is "-" it works fine
input: dog-
output: dog
-
My desired output(This is want i want to happen) but there is something wrong with the code.. its having an error after i did this "(-)(+)(/)()" even this
"(-)" + "(+)" + "(/)" + "()"
input: dog+cat/tree
output: dog
+
cat
/
tree
when space character input from textbox to listbox
input: dog+cat/ tree
output: dog
+
cat
/
tree
You need a character class, not the sequence of subpatterns inside separate capturing gorups:
Dim pattern As String = "([+/*-])"
This pattern will match and capture into Group 1 (and thus, all the captured values will be part of the resulting array) a char that is either a +, /, * or -. Note the position of the hyphen: since it is the last char in the character class, it is treated as a literal -, not a range operator.
See the regex demo:

Why regex.Match is returning empty string?

I just want to get the part of string that matches the regular expression but trying with match.Value or with groups it always returns "". It's driving me crazy.
EDIT:
This worked:
Private Function NormalizeValue(ByVal fieldValue As String) As String
Dim result As String = ""
Dim pattern As String = "[a-zA-Zñ'-]*"
Dim matches As Match
matches = Regex.Match(fieldValue, pattern)
While (matches.Success = True)
result = result & matches.Value
matches = matches.NextMatch()
End While
Return result
End Function
If your regex starts with ^ and ends with $, you are trying to match the whole string - not a part as your are stating in the question.
So you either need to remove them or rephrase your question.

Regular expression for splitting a string and add it to an Array

I have a string which is in the following format:
[0:2]={1.1,1,5.1.2}
My requirement here is to split the values inside curly braces after = operator, and store them in to a string array. I had tried to split part by using Substring() and IndexOf() methods, and it worked. But I needed a cleaner and elegant way to achieve this via regular expressions.
Does anybody having clues to achieve my requirement?
Here is your fully RegEx solution:
Dim input As String = "[0:2]={1.1,1,5.1.2}"
Dim match = Regex.Match(input, "\[\d:\d\]={(?:([^,]+),)*([^,]+)}")
Dim results = match.Groups(1).Captures.Cast(Of Capture).Select(Function(c) c.Value).Concat(match.Groups(2).Captures.Cast(Of Capture).Select(Function(c) c.Value)).ToArray()
Don't think it is more readable then standard split:
Dim startIndex = input.IndexOf("{"c) + 1
Dim length = input.Length - startIndex - 1
Dim results = input.Substring(startIndex, length).Split(",")
You could use a regular expression to extract the values inside the curly braces, and then use an ordinary Split:
Regex.Match("[0:2]={1.1,1,5.1.2}", "{(.*)}").Groups(1).Value.Split(","c)
Dim s As String = "[0:2]={1.1,1,5.1.2}";
Dim separatorChar as char = "="c;
Dim commaChar as char = ","c;
Dim openBraceChar as char = "{"c;
Dim closeBraceChar as char = "}"c;
Dim result() as String =
s.Split(separatorChar)(1)
.trim(openBraceChar)
.trim(closeBraceChar)
.split(commaChar);
(assuming it works! Typed on an iPad so can't verify syntax easily, but principal should be sound).
EDIT: updated to VB as downvoted for showing working .net methods in c# syntax.
if you want it using Regex
Dim s() As String=Regex.match(str,"(={)(.*)(})").Groups(1).Tostring.split(',');

Get/split text inside brackets/parentheses

Just have a list of words, such as:
gram (g)
kilogram (kg)
pound (lb)
just wondering how I would get the words within the brackets for example get the "g" in "gram (g)" and dim it as a new string.
Possibly using regex?
Thanks.
Use split function ..
strArr = str.Split("(") ' splitting 'gram (g)' returns an array ["gram " , "g)"] index 0 and 1
strArr2 = strArr[1].Split(")") ' splitting 'g)' returns an array ["g " ..]
the string is in
strArr2[0]
Edit
you want getAbbrev and getAbbrev2 to be arrays
try
Dim getAbbrev As String() = Str.Split("(")
Dim getAbbrev2 as String() = getAbbrev[1].Split(")")
To do it without declaring arrays you can do
"gram (g)".Split("(")[1].Split(")")[0]
but that's unreadable
Edit
You have some very trivial errors. I would suggest you strengthen your understanding on objects and declarations first. Then you can look into invoking methods. I rather have you understand it than give it to you. Re-read the book you have or look for a basic tutorial.
Dim unit As String = 'make sure this is the actual string you are getting, not sure where you are supposed to get the string value from => ie grams (g)
Dim getAbbrev As String() = unit.Split("(") 'use unit not Str - Str does not exist
Dim getAbbrev2 As String() = getAbbrev[1].Split(")") 'As no as - case sensitive
for the last line reference getAbbrev2 instead of the unknown abbrev2
Fun with Regular Expressions (I'm really not an expert here, but tested and works)
Imports System.Text.RegularExpressions
.....
Dim charsToTrim() As Char = { "("c, ")"c }
Dim test as String = "gram (g)" + Environment.NewLine +
"kilogram (kg)" + Environment.NewLine +
"pound (lb)"
Dim pattern as String = "\([a-zA-Z0-9]*\)"
Dim r As Regex = new Regex(pattern, RegexOptions.IgnoreCase)
Dim m As Match = r.Match(test)
While(m.Success)
System.Diagnostics.Debug.WriteLine("Match" + "=" + m.Value.ToString())
Dim tempText as String = m.Value.ToString().Trim(charsToTrim)
System.Diagnostics.Debug.WriteLine("String Trimmed" + "=" + tempText)
m = m.NextMatch()
End While
You can split at the space and remove the parens from the second token (by replacing them with an empty string).
A regex is also an option, and is very simple, its pattern is
\w+\s+\((\w+)\)
Which means, a word, then at least one space, then opening parens, then in real regex parens you search for a word, and, eventually a closing paren. The inner parentheses are capturing parentheses, which make it possible to refer to the unit g, kg, lb.