Splitting a String into a List(Of T) - regex

I have a data string that I want to split into a list of a class parses out all the data into different properties in the constructor. Each block starts with an STX character and ends with a string "PLC"(I don't know why the vendor didn't use ETX)
so basicly something that takes String datastream splits it at the string "PLC"(and keeps it) and puts it into dataList(of DataClass)
The data stream looks like this:
STX1;0;0;0;0;1;0;0;0;0;0;+3272;-2145;+3273;-2145;PLC\r\nSTX1;0;0;0;0;1;0;0;0;0;0;+3276;-2145;+3272;-2145;PLC\r\nSTX1;0;0;0;0;1;0;0;0;0;0;+3281;-2145;+3272;-2145;PLC\r\n
and would result in three entries in a list(of dataclass):
STX1;0;0;0;0;1;0;0;0;0;0;+3272;-2145;+3273;-2145;PLC
STX1;0;0;0;0;1;0;0;0;0;0;+3276;-2145;+3272;-2145;PLC
STX1;0;0;0;0;1;0;0;0;0;0;+3281;-2145;+3272;-2145;PLC
I have looked and I found lots of info on splitting strings in general but nothing about putting it into a class or list. I'm sure I could just do something like:
dim datalist as list(of dataclass)
dim splitdata() as string = datastream.split("PLC")
for each data as string in splitdata
datalist.Add(new dataclass(data))
next
but I'm sure there's a more efficant way(probably using regex or LINQ but I'm not really familary with either.
Thanks in advance!

Yes, a regular expression would do nicely for splitting the data into the pieces you show:
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim s = "STX;1;0;0;0;0;1;0;0;0;0;0;+3272;-2145;+3273;-2145;PLC\r\nSTX;1;0;0;0;0;1;0;0;0;0;0;+3276;-2145;+3272;-2145;PLC\r\nSTX;1;0;0;0;0;1;0;0;0;0;0;+3281;-2145;+3272;-2145;PLC"
Dim re As New Regex("(STX;.*?;PLC)")
Dim matches = re.Matches(s)
If matches.Count > 0 Then
For i = 0 To matches.Count - 1
Console.WriteLine(matches(i).Value)
'TODO: do whatever is required with matches(i)
Next
End If
Console.ReadLine()
End Sub
End Module
Outputs:
STX;1;0;0;0;0;1;0;0;0;0;0;+3272;-2145;+3273;-2145;PLC
STX;1;0;0;0;0;1;0;0;0;0;0;+3276;-2145;+3272;-2145;PLC
STX;1;0;0;0;0;1;0;0;0;0;0;+3281;-2145;+3272;-2145;PLC
In the above regex, the parentheses capture a group, the text parts STX; and ;PLC are literals to match, and the .*? matches anything (.) zero-or-more times (*) until the following text. The ? makes it "non-greedy". If it was greedy, it would match everything up until the final ;PLC and you would end up with the match being the whole line.
Edit
In the light of your comments, I suggest using the String.Split Method (String(), StringSplitOptions) overload:
Module Module1
Sub Main()
Dim s As String = "STX;1;0;0;0;0;1;0;0;0;0;0;+3272;-2145;+3273;-2145;PLC\r\nSTX;1;0;0;0;0;1;0;0;0;0;0;+3276;-2145;+3272;-2145;PLC\r\nSTX;1;0;0;0;0;1;0;0;0;0;0;+3281;-2145;+3272;-2145;PLC"
' transform the test string to its actual form
s = s.Replace("\r\n", vbCrLf)
' split it into the required parts as an array
Dim parts() As String = s.Split({vbCrLf}, StringSplitOptions.RemoveEmptyEntries)
' show the split worked as desired
For i = 0 To parts.Length - 1
Console.WriteLine(String.Format("Part {0}: {1}", i, parts(i)))
'TODO: do something with parts(i)
Next
Console.ReadLine()
End Sub
End Module
You didn't mention which version of VS you are using, so if the above complains about the line
Dim parts() As String = s.Split({vbCrLf}, StringSplitOptions.RemoveEmptyEntries)
then please replace it with
Dim splitAt() As String = {VbCrLf}
Dim parts() As String = s.Split(splitAt, StringSplitOptions.RemoveEmptyEntries)
Also, if the data is being read from a file then you can use the File.ReadAllLines Method to grab all the lines into an array in one go.

Related

Split 100-AA-1001A/B/C into Array 100-AA-1001A, 100-AA-1001B, 100-AA-1001C Using Regexp

I have the text 100-AA-1001A/B/C in a .txt file.
I would ideally like to be able to use a regular expression (or minimal VB coding) to split the text at the forward slash and include the 'prefix' to create an array of:
100-AA-1001A
100-AA-1001B
100-AA-1001C
I imagine it will be some kind of bracketing of the expression along the lines of:
Imports System
Imports System.Text.RegularExpressions
Sub RegexpSplitTxt()
Dim pattern As String = "(\d{3}-[A-Z]{2}-\d{4})[A-Z]?(\/[A-Z])?(\/[A-Z])?"
Dim replacement As String = "$2"
Dim input As String = "100-AA-1001A/B/C"
Dim result As String = Regexp ($1 Somewhere) & Regex.Replace(input, pattern, replacement)
Console.WriteLine(result)
End Sub
At the moment I am manually using Excel which is very time consuming.
If the format is always as you have shown then you just need to split the string on the slashes and replace the last character with each of the parts from the split:
Dim s = "100-AA-1001A/B/C"
Dim parts = s.Split("/"c)
Dim derived As New List(Of String)
derived.Add(parts(0))
For i = 1 To parts.Count - 1
derived.Add(parts(0).Remove(parts(0).Length - 1) & parts(i))
Next
Console.WriteLine(String.Join(vbCrLf, derived))
Console.ReadLine()
Outputs:
100-AA-1001A
100-AA-1001B
100-AA-1001C
You could get an array from derived with derived.ToArray() if you really need an array.

RegEx - VBA Finding splitting cell with two Uppercase [duplicate]

I'm new to VBA and would like to seek some help with regards to using RegEx and I hope somehow can enlighten me on what I'm doing wrong. I'm currently trying to split a date into its individual date, month and year, and possible delimiters include "," , "-" and "/".
Function formattedDate(inputDate As String) As String
Dim dateString As String
Dim dateStringArray() As String
Dim day As Integer
Dim month As String
Dim year As Integer
Dim assembledDate As String
Dim monthNum As Integer
Dim tempArray() As String
Dim pattern As String()
Dim RegEx As Object
dateString = inputDate
Set RegEx = CreateObject("VBScript.RegExp")
pattern = "(/)|(,)|(-)"
dateStringArray() = RegEx.Split(dateString, pattern)
' .... code continues
This is what I am currently doing. However, there seems to be something wrong during the RegEx.Split function, as it seems to cause my codes to hang and not process further.
To just confirm, I did something simple:
MsgBox("Hi")
pattern = "(/)|(,)|(-)"
dateStringArray() = RegEx.Split(dateString, pattern)
MsgBox("Bye")
"Hi" msgbox pops out, but the "Bye" msgbox never gets popped out, and the codes further down don't seem to get excuted at all, which led to my suspicion that the RegEx.Split is causing it to be stuck.
Can I check if I'm actually using RegEx.Split the right way? According to MSDN here, Split(String, String) returns an array of strings as well.
Thank you!
Edit: I'm trying not to explore the CDate() function as I am trying not to depend on the locale settings of the user's computer.
To split a string with a regular expression in VBA:
Public Function SplitRe(Text As String, Pattern As String, Optional IgnoreCase As Boolean) As String()
Static re As Object
If re Is Nothing Then
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.MultiLine = True
End If
re.IgnoreCase = IgnoreCase
re.Pattern = Pattern
SplitRe = Strings.Split(re.Replace(text, ChrW(-1)), ChrW(-1))
End Function
Usage example:
Dim v
v = SplitRe("a,b/c;d", "[,;/]")
Splitting by a regex is definitely nontrivial to implement compared to other regex operations, so I don't blame you for being stumped!
If you wanted to implement it yourself, it helps to know that RegExp objects from Microsoft VBScript Regular Expressions 5.5 have a FirstIndex property and a Length property, such that you can loop through the matches and pick out all the substrings between the end of one match (or the start of the string) and the start of the next match (or the end of the string).
If you don't want to implement it yourself, I've also implemented a RegexSplit UDF using those same RegExp objects on my GitHub.
Quoting an example from the documentation of VbScript Regexp:
https://msdn.microsoft.com/en-us/library/y27d2s18%28v=vs.84%29.aspx
Function SubMatchTest(inpStr)
Dim retStr
Dim oRe, oMatch, oMatches
Set oRe = New RegExp
' Look for an e-mail address (not a perfect RegExp)
oRe.Pattern = "(\w+)#(\w+)\.(\w+)"
' Get the Matches collection
Set oMatches = oRe.Execute(inpStr)
' Get the first item in the Matches collection
Set oMatch = oMatches(0)
' Create the results string.
' The Match object is the entire match - dragon#xyzzy.com
retStr = "Email address is: " & oMatch & vbNewLine
' Get the sub-matched parts of the address.
retStr = retStr & "Email alias is: " & oMatch.SubMatches(0) ' dragon
retStr = retStr & vbNewLine
retStr = retStr & "Organization is: " & oMatch.SubMatches(1) ' xyzzy
SubMatchTest = retStr
End Function
To test, call:
MsgBox(SubMatchTest("Please send mail to dragon#xyzzy.com. Thanks!"))
In short, you need your Pattern to match the various parts you want to extract, with the spearators in between, maybe something like:
"(\d+)[/-,](\d+)[/-,](\d+)"
The whole thing will be in oMatch, while the numbers (\d) will end up in oMatch.SubMatches(0) to oMatch.SubMatches(2).

Using Server Side VB to parse textbox contents from date range

I have a date range being inputted into a textbox from a jquery ui daterange selector. I need to get the values on postback of the start date and end date. These values are provided in the textbox, but I'm ignorant on how to seperate out these values on postback with VB server side code. Can anyone show me how I can use vbscript to separate the start and end dates? The textbox results are exactly as follows:
{"start":"2017-04-12","end":"2017-05-17"}
I tried using the following code, but it does not work
Dim strDateStart as String
Dim strDateEnd as String
strDateStart = txtSearchDateRange.Text
strDateStart = Replace(strDateStart, "end*", "")
strDateEnd = txtSearchDateRange.Text
strDateEnd = Replace(strDateEnd, "start*", "")
Thanks to #Mederic, the following code works:
Dim value As String = txtSearchDateRange.Text
Dim strStartDate As String = ""
Dim strEndDate As String = ""
Dim i As Integer = 0
' Call Regex.Matches method.
Dim matches As MatchCollection = Regex.Matches(value, "\d{4}-\d{2}-\d{2}")
' Loop over matches.
For Each m As Match In matches
' Loop over captures.
For Each c As Capture In m.Captures
i = i + 1
' Display.
Console.WriteLine("Index={0}, Value={1}", c.Index, c.Value)
If i = 1 Then strStartDate = c.Value
If i = 2 Then strEndDate = c.Value
Next
Next
Response.Write("<BR><BR><BR><BR><BR><BR>Start Date:" & strStartDate & "<BR><BR>End Date:" & strEndDate)
Regex Approach:
A cleaner approach to the Regex using groups
First:
Imports System.Text.RegularExpressions
Then:
'Our regex
Dim regex As Regex = New Regex("(?<start>\d{4}-\d{2}-\d{2}).*(?<end>\d{4}-\d{2}-\d{2})")
'Match from textbox content
Dim match As Match = regex.Match(TextBox1.Text)
'If match is success
If match.Success Then
'Print start group
Console.WriteLine(match.Groups("start").Value)
'Print end group
Console.WriteLine(match.Groups("end").Value)
End If
Explanation of the Regex:
(?<start>REGEX) = Captures a group named start
(?<end>REGEX) = Captures a group named end
\d = Matches a digit
{X} = Matches for X occurences
.* = Makes sure we match zero or one example so not both groups are named start
Example:
\d{4} = Matches 4 digits
Json Approach
Json approach would be possible but a bit more complex I think to implement as you have a illegal name in your Json String: end
But if you wanted to use Json you could import Newtonsoft.Json
And have a class as:
Public Class Rootobject
Public Property start As String
Public Property _end As String
End Class
And then deserialize like this:
Dim obj = JsonConvert.DeserializeObject(Of Rootobject)(TextBox1.Text)
However you would need to implement: DataContract and DataMember
To handle the word end
DataContract MSDN

Remove tweet regular expressions from string of text

I have an excel sheet filled with tweets. There are several entries which contain #blah type of strings among other. I need to keep the rest of the text and remove the #blah part. For example: "#villos hey dude" needs to be transformed into : "hey dude". This is what i ve done so far.
Sub Macro1()
'
' Macro1 Macro
'
Dim counter As Integer
Dim strIN As String
Dim newstring As String
For counter = 1 To 46
Cells(counter, "E").Select
ActiveCell.FormulaR1C1 = strIN
StripChars (strIN)
newstring = StripChars(strIN)
ActiveCell.FormulaR1C1 = StripChars(strIN)
Next counter
End Sub
Function StripChars(strIN As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Pattern = "^#?(\w){1,15}$"
.ignorecase = True
StripChars = .Replace(strIN, vbNullString)
End With
End Function
Moreover there are also entries like this one: Ÿ³é‡ï¼Ÿã€€åˆã‚ã¦çŸ¥ã‚Šã¾ã—ãŸã€‚ shiftã—ãªãŒã‚‰ã‚¨ã‚¯ã‚¹ãƒ
I need them gone too! Ideas?
For every line in the spreadsheet run the following regex on it: ^(#.+?)\s+?(.*)$
If the line matches the regex, the information you will be interested in will be in the second capturing group. (Usually zero indexed but position 0 will contain the entire match). The first capturing group will contain the twitter handle if you need that too.
Regex demo here.
However, this will not match tweets that are not replies (starting with #). In this situation the only way to distinguish between regular tweets and the junk you are not interested in is to restrict the tweet to alphanumerics - but this may mean some tweets are missed if they contain any non-alphanumerical characters. The following regex will work if that is not an issue for you:
^(?:(#.+?)\s+?)?([\w\t ]+)$
Demo 2.

excel regex end of line

I am looking for a regex for excel 2007 that can replace all instances of -3 ONLY at the end of the string, replacing it with absolutely nothing (removing it). There are instances of -3 throughout the strings, however I need to remove only the ones at the end. This is being integrated into a macro, so find and replace using a single regex is preferred.
You can do this without Regex by using VBA's Instr function. Here is the code:
Sub ReplaceIt()
Dim myRng As Range
myRange = Range("A1") ' change as needed
If InStr(Len(myRange.Text) - 2, myRange.Text, "-3") > 0 Then
myRange.Value = Left(myRange, Len(myRange) - 2)
End If
End Sub
Update
Based on Juri's comment below, changing the If statement to this will also work, and it's a bit cleaner.
If Right (MyRange, 2) = "-3" Then MyRange=Left(MyRange, Len(MyRange)-2)
Please try the following:-
Edit as per OP's comments:
Sub mymacro()
Dim myString as String
//'--do stuff
//'-- you could just do this or save the returning
//'-- string to another string for further processing :)
MsgBox replaceAllNeg3s(myString)
End Sub
Function replaceAllNeg3s(ByRef urstring As String) As String
Dim regex As Object
Dim strtxt As String
strtxt = urstring
Set regex = CreateObject("VBScript.RegExp")
With regex
//'-- replace all -3s at the end of the String
.Pattern = "[(-3)]+$"
.Global = True
If .test(strtxt) Then
//'-- ContainsAMatch = Left(strText,Len(strText)-2)
//'-- infact you can use replace
replaceAllNeg3s = Trim(.Replace(strText,""))
Else
replaceAllNeg3s = strText
End If
End With
End Function
//'-- tested for
//'-- e.g. thistr25ing is -3-3-3-3
//'-- e.g. 25this-3stringis25someting-3-3
//'-- e.g. this-3-3-3stringis25something-5
//'-- e.g. -3this-3-3-3stringis25something-3
Unless its part of a bigger macro, there's no need for VBA here! Simply use this formula and you'll get the result:
=IF(RIGHT(A1,2)="-3",LEFT(A1,LEN(A1)-2),A1)
(assuming that your text is in cell A1)