Regex- match every thing that is in bracket including newline - regex

script file content is
//{input: x(width),y(height);
//output: z(area);}
function(x,y)
z=x*y
I have to read only these lines. What will be the regular expression for data that is in curly braces
//{input: x(width),y(height);
//output: z(area);}
I tried the following
Dim sr As StreamReader = New StreamReader(scriptpath)
' Dim textToParse As String
Dim scriptText As String
scriptText = sr.ReadToEnd
Dim extractCommentRegex As New Regex("\/\/\{(.*?)\}")
Dim textToParse As Match = extractCommentRegex.Match(scriptText)

Try this
^\/\/\{.*\}
with /m option to make dot match newlines.
Hi, sorry haven't been writing vb for quite a while so did not make the answer clear enough. I've created a console project to test following code:
Dim sr As StreamReader = New StreamReader("d:\script1.txt")
' Dim textToParse As String
Dim scriptText As String
scriptText = sr.ReadToEnd
Dim match = Regex.Match(scriptText, "^\/\/\{.*\}", RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnorePatternWhitespace)
Console.WriteLine(match.Success)
Dim sw As StreamWriter = New StreamWriter("d:\output.txt")
sw.Write(match.Value)
sw.Flush()
sw.Close()
Console.ReadLine()
And I'll get following for output.txt.
//{input: x(width),y(height);
//output: z(area);}
I think you need to provide RegexOptions If you have Windows format LF for the input file. for detail of the issue please see this thread:
.NET Regex dot character matches carriage return?

Related

Split 100-AA-1001A/B/C into Array 100-AA-1001A, 100-AA-1001B, 100-AA-1001C Using Regexp

I have the text 100-AA-1001A/B/C in a .txt file.
I would ideally like to be able to use a regular expression (or minimal VB coding) to split the text at the forward slash and include the 'prefix' to create an array of:
100-AA-1001A
100-AA-1001B
100-AA-1001C
I imagine it will be some kind of bracketing of the expression along the lines of:
Imports System
Imports System.Text.RegularExpressions
Sub RegexpSplitTxt()
Dim pattern As String = "(\d{3}-[A-Z]{2}-\d{4})[A-Z]?(\/[A-Z])?(\/[A-Z])?"
Dim replacement As String = "$2"
Dim input As String = "100-AA-1001A/B/C"
Dim result As String = Regexp ($1 Somewhere) & Regex.Replace(input, pattern, replacement)
Console.WriteLine(result)
End Sub
At the moment I am manually using Excel which is very time consuming.
If the format is always as you have shown then you just need to split the string on the slashes and replace the last character with each of the parts from the split:
Dim s = "100-AA-1001A/B/C"
Dim parts = s.Split("/"c)
Dim derived As New List(Of String)
derived.Add(parts(0))
For i = 1 To parts.Count - 1
derived.Add(parts(0).Remove(parts(0).Length - 1) & parts(i))
Next
Console.WriteLine(String.Join(vbCrLf, derived))
Console.ReadLine()
Outputs:
100-AA-1001A
100-AA-1001B
100-AA-1001C
You could get an array from derived with derived.ToArray() if you really need an array.

Regex pattern to match period and pattern

I have a string which I am trying to write an regex for
CODAA0870E - This an error string is not valid.
I wrote a regex COD[a-zA-Z0-9]*.....................................
but the length of the string can vary i.e. after COD till the period.
The regex needs to check COD at the start and should end at the period.`
The code I have written so far does not work
Dim value As String = "daafasfasfCODAA0870E - This an error string is not valid.dfsfsfsfcCODAAvcv0870E - This an second error string is not valid.sdfsdf "
Dim pattern As String = "COD[^.]+\."
Dim array() As String = System.Text.RegularExpressions.Regex.Split(value, pattern)
You need this regex:
Dim pattern As String = "COD[^.]+\."
And to get all matches use:
Dim matches As MatchCollection = Regex.Matches(value, pattern)
See more code samples here
Think you want something like this,
^COD[^.]*\.

Splitting a String into a List(Of T)

I have a data string that I want to split into a list of a class parses out all the data into different properties in the constructor. Each block starts with an STX character and ends with a string "PLC"(I don't know why the vendor didn't use ETX)
so basicly something that takes String datastream splits it at the string "PLC"(and keeps it) and puts it into dataList(of DataClass)
The data stream looks like this:
STX1;0;0;0;0;1;0;0;0;0;0;+3272;-2145;+3273;-2145;PLC\r\nSTX1;0;0;0;0;1;0;0;0;0;0;+3276;-2145;+3272;-2145;PLC\r\nSTX1;0;0;0;0;1;0;0;0;0;0;+3281;-2145;+3272;-2145;PLC\r\n
and would result in three entries in a list(of dataclass):
STX1;0;0;0;0;1;0;0;0;0;0;+3272;-2145;+3273;-2145;PLC
STX1;0;0;0;0;1;0;0;0;0;0;+3276;-2145;+3272;-2145;PLC
STX1;0;0;0;0;1;0;0;0;0;0;+3281;-2145;+3272;-2145;PLC
I have looked and I found lots of info on splitting strings in general but nothing about putting it into a class or list. I'm sure I could just do something like:
dim datalist as list(of dataclass)
dim splitdata() as string = datastream.split("PLC")
for each data as string in splitdata
datalist.Add(new dataclass(data))
next
but I'm sure there's a more efficant way(probably using regex or LINQ but I'm not really familary with either.
Thanks in advance!
Yes, a regular expression would do nicely for splitting the data into the pieces you show:
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim s = "STX;1;0;0;0;0;1;0;0;0;0;0;+3272;-2145;+3273;-2145;PLC\r\nSTX;1;0;0;0;0;1;0;0;0;0;0;+3276;-2145;+3272;-2145;PLC\r\nSTX;1;0;0;0;0;1;0;0;0;0;0;+3281;-2145;+3272;-2145;PLC"
Dim re As New Regex("(STX;.*?;PLC)")
Dim matches = re.Matches(s)
If matches.Count > 0 Then
For i = 0 To matches.Count - 1
Console.WriteLine(matches(i).Value)
'TODO: do whatever is required with matches(i)
Next
End If
Console.ReadLine()
End Sub
End Module
Outputs:
STX;1;0;0;0;0;1;0;0;0;0;0;+3272;-2145;+3273;-2145;PLC
STX;1;0;0;0;0;1;0;0;0;0;0;+3276;-2145;+3272;-2145;PLC
STX;1;0;0;0;0;1;0;0;0;0;0;+3281;-2145;+3272;-2145;PLC
In the above regex, the parentheses capture a group, the text parts STX; and ;PLC are literals to match, and the .*? matches anything (.) zero-or-more times (*) until the following text. The ? makes it "non-greedy". If it was greedy, it would match everything up until the final ;PLC and you would end up with the match being the whole line.
Edit
In the light of your comments, I suggest using the String.Split Method (String(), StringSplitOptions) overload:
Module Module1
Sub Main()
Dim s As String = "STX;1;0;0;0;0;1;0;0;0;0;0;+3272;-2145;+3273;-2145;PLC\r\nSTX;1;0;0;0;0;1;0;0;0;0;0;+3276;-2145;+3272;-2145;PLC\r\nSTX;1;0;0;0;0;1;0;0;0;0;0;+3281;-2145;+3272;-2145;PLC"
' transform the test string to its actual form
s = s.Replace("\r\n", vbCrLf)
' split it into the required parts as an array
Dim parts() As String = s.Split({vbCrLf}, StringSplitOptions.RemoveEmptyEntries)
' show the split worked as desired
For i = 0 To parts.Length - 1
Console.WriteLine(String.Format("Part {0}: {1}", i, parts(i)))
'TODO: do something with parts(i)
Next
Console.ReadLine()
End Sub
End Module
You didn't mention which version of VS you are using, so if the above complains about the line
Dim parts() As String = s.Split({vbCrLf}, StringSplitOptions.RemoveEmptyEntries)
then please replace it with
Dim splitAt() As String = {VbCrLf}
Dim parts() As String = s.Split(splitAt, StringSplitOptions.RemoveEmptyEntries)
Also, if the data is being read from a file then you can use the File.ReadAllLines Method to grab all the lines into an array in one go.

Remove all comments from a PHP source file

I would like to remove all comments from a PHP source file from within a VB.NET application. Another stackoverflow question showed how to do this in C# code
I came up with this conversion, but it does not work unfortunately:
Dim blockComments As String = "/\*(.*?)\*/"
Dim lineComments As String = "//(.*?)\r?\n"
Dim strings As String = """((\\[^\n]|[^""\n])*)"""
Dim verbatimStrings As String = "#(""[^""]*"")+"
regex = New Regex(blockComments & "|" & lineComments)
srcT = regex.Replace(srcT, "")
You need to pass the flag RegexOptions.Singleline when constructing the Regex object. Otherwise, the block-comments can't span multiple lines.
regex = New Regex(blockComments & "|" & lineComments, RegexOptions.Singleline)
The . normally matches any character except newline (\n). The RegexOptions.Singleline flag makes it match any character, including newline.

VBA Regular Expressions - Run-Time Error 91 when trying to replace characters in string

I am doing this task as part of a larger sub in order to massively reduce the workload for a different team.
I am trying to read in a string and use Regular Expressions to replace one-to-many spaces with a single space (or another character). At the moment I am using a local string, however in the main sub this data will come from an external .txt file. The number of spaces between elements in this .txt can vary depeneding on the row.
I am using the below code, and replacing the spaces with a dash. I have tried different variations and different logic on the below code, but always get "Run-time error '91': Object Variable or with clock variable not set" on line "c = re.Replace(s, replacement)"
After using breakpoints, I have found out that my RegularExpression (re) is empty, but I can't quite figure out how to progress from here. How do I replace my spaces with dashes? I have been at this problem for hours and spent most of that time on Google to see if someone has had a similar issue.
Sub testWC()
Dim s As String
Dim c As String
Dim re As RegExp
s = "hello World"
Dim pattern As String
pattern = "\s+"
Dim replacement As String
replacement = "-"
c = re.Replace(s, replacement)
Debug.Print (c)
End Sub
Extra information: Using Excel 2010. Have successfully linked all my references (Microsoft VBScript Regular Expressions 5.5". I was sucessfully able to replace the spaces using the vanilla "Replace" function, however as the number of spaces between elements vary I am unable to use that to solve my issue.
Ed: My .txt file is not fixed either, there are a number of rows that are different lengths so I am unable to use the MID function in excel to dissect the string either
Please help
Thanks,
J.H.
You're not setting up the RegExp object correctly.
Dim pattern As String
pattern = "\s+" ' pattern is just a local string, not bound to the RegExp object!
You need to do this:
Dim re As RegExp
Set re = New RegExp
re.Pattern = "\s+" ' Now the pattern is bound to the RegExp object
re.Global = True ' Assuming you want to replace *all* matches
s = "hello World"
Dim replacement As String
replacement = "-"
c = re.Replace(s, replacement)
Try setting the pattern inside your Regex object. Right now, re is just a regex with no real pattern assigned to it. Try adding in re.Pattern = pattern after you initialize your pattern string.
You initialized the pattern but didn't actually hook it into the Regex. When you ended up calling replace it didn't know what it was looking for pattern wise, and threw the error.
Try also setting the re as a New RegExp.
Sub testWC()
Dim s As String
Dim c As String
Dim re As RegExp
Set re = New RegExp
s = "hello World"
Dim pattern As String
pattern = "\s+"
re.Pattern = pattern
Dim replacement As String
replacement = "-"
c = re.Replace(s, replacement)
Debug.Print (c)
End Sub