How to capture value between two strings in VB.NET - regex

i'm trying to capture a value between two strings using VB.NET
Each line from the file i'm reading in from can contain many different parameters, in any order, and I'd like to store the values of these parameters in their own variables. Two sample lines would be:
identifier="121" messagecount="112358" timestamp="11:31:41.622" column="5" row="98" colour="ORANGE" value="Hello"
or it could be:
identifier="1121" messagecount="1123488" timestamp="19:14:41.568" valid="true" state="running"
Also, this may not be the sole text in the string, there may be other values before and after (and in between) the parameters i would like to capture.
So essentially i'd need to store everything between 'identifier="' and it's closing '"' into an identifier variable, and so on... As the order of these parameters within each line can change, i can't simply stick the first value in one variable each time, I have to refer to them specifically by what their name is (identifier, messagecount) etc.
Can anyone help? Thanks. I guess it would be via a regular expression, but i'm not too hot on those. I'd prefer to have each expression for each paramater within it's own statement, rather than being all in one, thanks.

Here is a sample how you can go about that. It converts one line into a dictionary.
This will capture any string consisting of a-z-characters (case-insensitive) as the attribute name, and then catch any character other than " in the value string. (If " can occur in the string as "" you need to add some treatment for that.)
Imports System.Text.RegularExpressions
[...]
Dim s As String =
"identifier=""121"" messagecount=""112358"" " &
"timestamp=""11:31:41.622"" column=""5"" row=""98"" " &
"colour=""ORANGE"" value=""Hello"""
Dim d As New Dictionary(Of String, String)
Dim rx As New Regex("([a-z]+)=""(.*?)""", RegexOptions.IgnoreCase)
Dim rxM As MatchCollection = rx.Matches(s)
For Each M As Match In rxM
d.Add(M.Groups(1).Value, M.Groups(2).Value)
Next
' Dictionary is ready
' test output
For Each k As String In d.Keys
MsgBox(String.Format("{0} => {1}", k, d(k)))
Next

You just need to split the data into manageable clumps, and then go through it. Something like this to start you off.
Private Sub ProcessMyData(LineOfData As String)
' NOTE! This assumes all your 'names' have no spaces in!
Dim vElements = LineOfData.Split({" "c}, StringSplitOptions.RemoveEmptyEntries)
For Each vElement In vElements
Dim vPair = vElement.Split({"="c})
Dim vResult = vPair(1).Trim(Convert.ToChar(34))
Select Case vPair(0).ToLower
Case "identifier"
MyIDVariable = CInt(vResult)
Case "colour"
MyColourVariable = vResult
' etc., etc.
End Select
Next
End Sub
You can define the variables you want locally in the sub [function], and then return a list/dictionary/custom class of the things you're interested in.

Related

Changing formulas on the fly with VBA RegEx

i'm trying to change formulas in excel, i need to change the row number of the formulas.
I'm trying do use replace regex to do this. I use an loop to iterate through the rows of the excel and need to change the formula for the row that is iterating at the time. Here is an exemple of the code:
For i = 2 To rows_aux
DoEvents
Formula_string= "=IFS(N19='Z001';'xxxxxx';N19='Z007';'xxxxxx';0=0;'xxxxxxx')"
Formula_string_new = regEx.Replace(Formula_string, "$1" & i)
wb.Cells(i, 33) = ""
wb.Cells(i, 33).Formula = Formula_string_new
.
.
.
Next i
I need to replace rows references but not the ones in quotes or double quotes. Example:
If i = 2 i want the new string to be this:
"=IFS(N2='Z001';'xxxxxx';N2='Z007';'xxxxxx';0=0;'xxxxxxx')"
I'm trying to use this regex:
([a-zA-Z]+)(\d+)
But its changing everything in quotes too. Like this:
If i = 2:
"=IFS(N2='Z2';'xxxxxx';N2='Z2';'xxxxxx';0=0;'xxxxxxx')"
If anyone can help me i will be very grateful!
Thanks in advance.
As others have written, there are probably better ways to write this code. But for a regex that will capture just the Column letter in capturing group #1, try:
\$?\b(XF[A-D]|X[A-E][A-Z]|[A-W][A-Z]{2}|[A-Z]{2}|[A-Z])\$?(?:104857[0-6]|10485[0-6]\d|1048[0-4]\d{2}|104[0-7]\d{3}|10[0-3]\d{4}|[1-9]\d{1,5}|[1-9])d?
Note that is will NOT include the $ absolute addressing token, but could be altered if that were necessary.
Note that you can avoid the loop completely with:
Formula_string = "=IFS(N19=""Z001"",""xxxxxx"",N$19=""Z007"",""xxxxxx"",0=0,""xxxxxxx"")"
Formula_string_new = regEx.Replace(Formula_string, "$1" & firstRow)
With Range(wb.Cells(firstRow, 33), wb.Cells(lastRow, 33))
.Clear
.Formula = Formula_string_new
End With
When we write a formula to a range like this, the references will automatically adjust the way you were doing in your loop.
Depending on unstated factors, you may want to use the FormulaLocal property vice the Formula property.
Edit:
To make this a little more robust, in case there happens to be, within the quote marks, a string that exactly mimics a valid address, you can try checking to be certain that a quote (single or double) neither precedes nor follows the target.
Pattern: ([^"'])\$?\b(XF[A-D]|X[A-E][A-Z]|[A-W][A-Z]{2}|[A-Z]{2}|[A-Z])\$?(?:104857[0-6]|10485[0-6]\d|1048[0-4]\d{2}|104[0-7]\d{3}|10[0-3]\d{4}|[1-9]\d{1,5}|[1-9])d?\b(?!['"])
Replace: "$1$2" & i
However, this is not "bulletproof" as various combinations of included data might match. If it is a problem, let me know and I'll come up with something more robust.
If you can identify some unique features like in the example preceding bracket ( or colon ; and trailing equal = then this might work
Sub test()
Dim s As String, sNew As String, i As Long
Dim Regex As Object
Set Regex = CreateObject("vbscript.regexp")
With Regex
.Global = True
.MultiLine = False
.IgnoreCase = True
.Pattern = "([(;][a-zA-Z]{1,3})(\d+)="
End With
i = 1
s = "=IFS(NANA19='Z001';'xxxxxx';NA19='Z007';'xxxxxx';0=0;'xxxxxxx')"
sNew = Regex.Replace(s, "$1" & i & "=")
Debug.Print s & vbCr & sNew
End Sub

add datetime as string to a string after matching a pattern in vb.net

I have this string for example: "Example_string.xml"
and i would like to add before the "." _DateTime of now so it will be like:
"Example_string_20151808185631.xml"
How can i achieve it? regex?
Yes, you can achieve that through the use of a look ahead. For instance:
Dim result As String = Regex.Replace("Example_string.xml", "(?=\.)", "_20151808185631")
Since the pattern only matches a position in the string (the position just before the period), rather than matching a portion of the text, the replace method doesn't actually replace any of the input text. It effectively just inserts the replacement text into that position in the string.
Alternatively, if you find that confusing, you could just match the period and then just include the period in the replacement text:
Dim result As String = Regex.Replace("Example_string.xml", "\.", "_20151808185631.")
If you don't want to just look for any period, and you want to be more safe about it (such as handling file names that contain multiple periods, then instead of \., you could use something like \.\w+$. However, if you need to make it that resilient, and it doesn't have to be done with RegEx, it would be better to use the Path.GetFileNameWithoutExtension and Path.GetExtension methods, as recommended by Crowcoder. For instance, you may also need to make it handle file names that have no extension, which even further complicates it.
or...
Path.GetFileNameWithoutExtension("Example_string.xml") + "_20151808185631" + Path.GetExtension("Example_string.xml")
How about:
Dim sFile As String = "Example_string.xml"
Dim sResult As String = sFile.ToLower.Replace(".xml", "_" & Format(Now(), "yyyyMMddHHmmss") & ".xml")
MsgBox(sresult, , sFile)

How do you use RegEx to return a parsed value?

I have a data column that has a heading value with multiple levels, where I only want the first three levels, but I cannot figure out how to get the parsed value?
I was reading this and it shows how to use create a function to return a boolean for the condition, but how would I create a function that would return a parsed value?
This is the Regular Expression that I think I need.
^(\d.\d.\d)
I'm looking for something that would change 1.2.3.4.5. to 1.2.3 and similar for any other header I have that has more than three levels.
Ideally, I'd like to be able to put it into my Query Design as a Field Expression, but I'm not sure how I would do that.
I assumed your input values could have more than one digit between the dots. In other words, I think you want this ...
? RegExpGetMatch("1.2.3.4.5.", "^(\d+\.\d+\.\d+).*", 1)
1.2.3
? RegExpGetMatch("1.27.3.4.5.", "^(\d+\.\d+\.\d+).*", 1)
1.27.3
If that is the correct behavior, here is the function I used.
Public Function RegExpGetMatch(ByVal pSource As String, _
ByVal pPattern As String, _
ByVal pGroup As Long) As String
'requires reference to Microsoft VBScript Regular Expressions
'Dim re As RegExp
'Set re = New RegExp
'late binding; no reference needed
Dim re As Object
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.Pattern = pPattern
RegExpGetMatch = re.Replace(pSource, "$" & pGroup)
Set re = Nothing
End Function
See also this answer by KazJaw. His answer taught me how to select the match group with RegExp.Replace.
In a query run within an Access session, you could use the function like this:
SELECT
RegExpGetMatch([Data Column], "^(\d+\.\d+\.\d+).*", 1) AS parsed_value
FROM YourTable;
Note however a custom VBA function is not usable for queries run from outside an Access session.
Try changing your RegEx to ^(\d\.\d\.\d). You need to escape the . since it has a special meaning in RegExp.

MS Word + VBA + RegExp: Get Page Number of Match

Is that possible? Probably not? How can I then find all exact occurrences of a match and the according page numbers?
EDIT:
I have the regex working properly. What I need is for each match to get all the pages it appears on.
Example:
regex = \b\d{3}\b
123 appears on page 1,4,20
243 appear on page 3,5,7
523 appears on page 9
How can I get that information (all the pages a match occurs on?)
This is for creating some kind of index automatically.
EDIT 2:
I got a basic working version, snippet:
Set Matches = regExp.Execute(ActiveDocument.range.Text)
For Each Match In Matches
Set range = ActiveDocument.range(Match.FirstIndex, Match.FirstIndex + Len(Match.Value))
page = range.Information(wdActiveEndAdjustedPageNumber)
The problem is that Match.FirstIndex does not always point to the first character of the match in ActiveDocument.range. Word tables mess this up as ActiveDocument.range.Text contains characters that are not on the text put represent something in the table.
I think this probably fits better in SuperUser.
The answer to the question is "yes."
Selection.Information(wdActiveEndAdjustedPageNumber)
The above property in VBA will get you the page number of a selection.
Also, VBA can do some regular expression work.
This turned out to be rather complex and I can't say if my solution works for any document. The main issue is as indicated in the Question, that RegexMatch.FirstIndex can not be used to determine were the actually Match is within the MS Word Document. This is due to the fact that regex matching is done on range.Text property (String) and that string just contains different amount of characters than the range object does and hence Indexes don't match.
So my solution is for each match, I do a Find in the whole document for that match. the find methods gives a Range object from which the correct page can be determined.
In my special case a match could be the same thing also different value. Example: 343in my case would be the same as Prefix-343. A second issue was that the matches must be sorted eg 123before 324regardless which one occurs first in the document.
If you require the Sort Functionality you will also need the following to "modules":
SortDictionary Function:
http://www.cpearson.com/excel/CollectionsAndDictionaries.htm
Module "modQSortInPlace":
http://www.cpearson.com/Zips/modQSortInPlace.zip
If no sort is needed you don't need them but you need to remove the according function call SortDictionary Dict, Truefrom my code.
Now to my code. Soem parts you can remove, especially the formatting one. This is specific to my case. Also if your match is "unique", eg. not prefix or so you can simplify the code too. You will need to reference the "Microsoft Scripting Library".
Option Explicit
Sub ExtractRNumbers()
Dim Dict As Scripting.Dictionary
Set Dict = CreateObject("Scripting.dictionary")
Dim regExp, Match, Matches
Dim rNumber As String
Dim range As range
Set regExp = CreateObject("VBScript.RegExp")
regExp.Pattern = "\b(R-)?\d{2}-\d{4,5}(-\d)?\b"
regExp.IgnoreCase = False
regExp.Global = True
' determine main section, only extract R-Numbers from main section
' and not the Table of contents as example
' main section = section with most characters
Dim section As section
Dim maxSectionSize As Long
Dim sectionSize As Long
Dim sectionIndex As Integer
Dim currentIndex As Integer
maxSectionSize = 0
currentIndex = 1
For Each section In ActiveDocument.Sections
sectionSize = Len(section.range.text)
If sectionSize > maxSectionSize Then
maxSectionSize = sectionSize
sectionIndex = currentIndex
End If
currentIndex = currentIndex + 1
Next
Set Matches = regExp.Execute(ActiveDocument.Sections(sectionIndex).range.text)
For Each Match In Matches
' If the Document contains Tables, ActiveDocument.range.Text will contain
' BEL charachters (chr(7)) that probably define the table structure. The issue
' is that then Match.FirstIndex does not point to the actual first charachter
' of a Match in the Document.
' Also there are other things (unknwon) that lead to the same issue, eg.
' Match.FirstIndex can not be used to find the actual "matching word" within the
' document. Because of that below commented apporach does not work on a generic document
' Set range = ActiveDocument.range(Match.FirstIndex, Match.FirstIndex + Len(Match.Value))
' page = range.Information(wdActiveEndAdjustedPageNumber)
' Maybe there is a simpler solution but this works more or less
' the exception beign tables again. see http://support.microsoft.com/kb/274003
' After a match is found the whole document is searched using the find method.
' For each find result the page number is put into an array (if it is not in the array yet)
' Then the match is formatted properly.
' After formatting, it is checked if the match was previously already found
'
' If not, we add a new entry to the dictionary (key = formatted match, value = array of page numbers)
'
' If match was already found before (but potentially in a different format! eg R-87-1000 vs 87-1000 as example),
' all additional pages are added to the already found pages.
Set range = ActiveDocument.Sections(sectionIndex).range
With range.Find
.text = Match.Value
.MatchWholeWord = True
.MatchCase = True
.Wrap = wdFindStop
End With
Dim page As Variant
Dim pages() As Integer
Dim index As Integer
index = 0
ReDim pages(0)
Do While range.Find.Execute() = True
page = range.Information(wdActiveEndAdjustedPageNumber)
If Not IsInArray(page, pages) Then
ReDim Preserve pages(index)
pages(index) = page
index = index + 1
End If
Loop
' FORMAT TO PROPER R-NUMBER: This is specific to my case
rNumber = Match.Value
If Not rNumber Like "R-*" Then
rNumber = "R-" & rNumber
End If
' remove possible batch number as r-number
If Len(rNumber) > 11 Then
rNumber = Left(rNumber, Len(rNumber) - 2)
End If
' END FORMAT
If Not Dict.Exists(rNumber) Then
Dict.Add rNumber, pages
Else
Dim existingPages() As Integer
existingPages = Dict(rNumber)
For Each page In pages
If Not IsInArray(page, existingPages) Then
' add additonal pages. this means that the previous match
' was formatted different, eg R-87-1000 vs 87-1000 as example
ReDim Preserve existingPages(UBound(existingPages) + 1)
existingPages(UBound(existingPages)) = page
Dict(rNumber) = existingPages
End If
Next
End If
Next
'sort dictionary by key (R-Number)
SortDictionary Dict, True
Dim fso
Set fso = CreateObject("Scripting.FileSystemObject")
Dim stream
' Create a TextStream.
Set stream = fso.CreateTextFile(ActiveDocument.Path & "\" & ActiveDocument.Name & "-rNumbers.txt", True)
Dim key As Variant
Dim output As String
Dim i As Integer
For Each key In Dict.Keys()
output = key & vbTab
pages = Dict(key)
For i = LBound(pages) To UBound(pages)
output = output & pages(i) & ", "
Next
output = Left(output, Len(output) - 2)
stream.WriteLine output
Next
Set Dict = Nothing
stream.Close
End Sub
Private Function IsInArray(page As Variant, pages As Variant) As Boolean
Dim i As Integer
IsInArray = False
For i = LBound(pages) To UBound(pages)
If pages(i) = page Then
IsInArray = True
Exit For
End If
Next
End Function

regular expressions and vba

Does anyone know how to extract matches as strings from a RegExp.Execute() function?
Let me show you what I've gotten to so far:
Regex.Pattern = "^[^*]*[*]+"
Set myMatches = Regex.Execute(temp)
I want the object "myMatches" which is holding the matches, to be converted to a string. I know that there is only going to be one match per execution.
Does anyone know how to extract the matches from the object as Strings to be displayed lets say via a MsgBox?
Try this:
Dim sResult As String
'// Your expression code here...
sResult = myMatches.Item(0)
'// or
sResult = myMatches(0)
Msgbox("The matching text was: " & sResult)
The Execute method returns a match collection and you can use the item property to retrieve the text using an index.
As you stated you only ever have one match then the index is zero. If you have more than one match you can return the index of the match you require or loop over the entire collection.
This page has a lot of information on regex and seems to have what you want.
http://www.regular-expressions.info/vbscript.html