regular expressions and vba - regex

Does anyone know how to extract matches as strings from a RegExp.Execute() function?
Let me show you what I've gotten to so far:
Regex.Pattern = "^[^*]*[*]+"
Set myMatches = Regex.Execute(temp)
I want the object "myMatches" which is holding the matches, to be converted to a string. I know that there is only going to be one match per execution.
Does anyone know how to extract the matches from the object as Strings to be displayed lets say via a MsgBox?

Try this:
Dim sResult As String
'// Your expression code here...
sResult = myMatches.Item(0)
'// or
sResult = myMatches(0)
Msgbox("The matching text was: " & sResult)
The Execute method returns a match collection and you can use the item property to retrieve the text using an index.
As you stated you only ever have one match then the index is zero. If you have more than one match you can return the index of the match you require or loop over the entire collection.

This page has a lot of information on regex and seems to have what you want.
http://www.regular-expressions.info/vbscript.html

Related

The regex in string.format of LUA

I use string.format(str, regex) of LUA to fetch some key word.
local RICH_TAGS = {
"texture",
"img",
}
--\[((img)|(texture))=
local START_OF_PATTER = "\\[("
for index = 1, #RICH_TAGS - 1 do
START_OF_PATTER = START_OF_PATTER .. "(" .. RICH_TAGS[index]..")|"
end
START_OF_PATTER = START_OF_PATTER .. "("..RICH_TAGS[#RICH_TAGS].."))"
function RichTextDecoder.decodeRich(str)
local result = {}
print(str, START_OF_PATTER)
dump({string.find(str, START_OF_PATTER)})
end
output
hello[img=123] \[((texture)|(img))
dump from: [string "utils/RichTextDecoder.lua"]:21: in function 'decodeRich'
"<var>" = {
}
The output means:
str = hello[img=123]
START_OF_PATTER = \[((texture)|(img))
This regex works well with some online regex tools. But it find nothing in LUA.
Is there any wrong using in my code?
You cannot use regular expressions in Lua. Use Lua's string patterns to match strings.
See How to write this regular expression in Lua?
Try dump({str:find("\\%[%("))})
Also note that this loop:
for index = 1, #RICH_TAGS - 1 do
START_OF_PATTER = START_OF_PATTER .. "(" .. RICH_TAGS[index]..")|"
end
will leave out the last element of RICH_TAGS, I assume that was not your intention.
Edit:
But what I want is to fetch several specific word. For example, the
pattern can fetch "[img=" "[texture=" "[font=" any one of them. With
the regex string I wrote in my question, regex can do the work. But
with Lua, the way to do the job is write code like string.find(str,
"[img=") and string.find(str, "[texture=") and string.find(str,
"[font="). I wonder there should be a way to do the job with a single
pattern string. I tryed pattern string like "%[%a*=", but obviously it
will fetch a lot more string I need.
You cannot match several specific words with a single pattern unless they are in that string in a specific order. The only thing you could do is to put all the characters that make up those words into a class, but then you risk to find any word you can build from those letters.
Usually you would match each word with a separate pattern or you match any word and check if the match is one of your words using a look up table for example.
So basically you do what a regex library would do in a few lines of Lua.

add datetime as string to a string after matching a pattern in vb.net

I have this string for example: "Example_string.xml"
and i would like to add before the "." _DateTime of now so it will be like:
"Example_string_20151808185631.xml"
How can i achieve it? regex?
Yes, you can achieve that through the use of a look ahead. For instance:
Dim result As String = Regex.Replace("Example_string.xml", "(?=\.)", "_20151808185631")
Since the pattern only matches a position in the string (the position just before the period), rather than matching a portion of the text, the replace method doesn't actually replace any of the input text. It effectively just inserts the replacement text into that position in the string.
Alternatively, if you find that confusing, you could just match the period and then just include the period in the replacement text:
Dim result As String = Regex.Replace("Example_string.xml", "\.", "_20151808185631.")
If you don't want to just look for any period, and you want to be more safe about it (such as handling file names that contain multiple periods, then instead of \., you could use something like \.\w+$. However, if you need to make it that resilient, and it doesn't have to be done with RegEx, it would be better to use the Path.GetFileNameWithoutExtension and Path.GetExtension methods, as recommended by Crowcoder. For instance, you may also need to make it handle file names that have no extension, which even further complicates it.
or...
Path.GetFileNameWithoutExtension("Example_string.xml") + "_20151808185631" + Path.GetExtension("Example_string.xml")
How about:
Dim sFile As String = "Example_string.xml"
Dim sResult As String = sFile.ToLower.Replace(".xml", "_" & Format(Now(), "yyyyMMddHHmmss") & ".xml")
MsgBox(sresult, , sFile)

How to capture value between two strings in VB.NET

i'm trying to capture a value between two strings using VB.NET
Each line from the file i'm reading in from can contain many different parameters, in any order, and I'd like to store the values of these parameters in their own variables. Two sample lines would be:
identifier="121" messagecount="112358" timestamp="11:31:41.622" column="5" row="98" colour="ORANGE" value="Hello"
or it could be:
identifier="1121" messagecount="1123488" timestamp="19:14:41.568" valid="true" state="running"
Also, this may not be the sole text in the string, there may be other values before and after (and in between) the parameters i would like to capture.
So essentially i'd need to store everything between 'identifier="' and it's closing '"' into an identifier variable, and so on... As the order of these parameters within each line can change, i can't simply stick the first value in one variable each time, I have to refer to them specifically by what their name is (identifier, messagecount) etc.
Can anyone help? Thanks. I guess it would be via a regular expression, but i'm not too hot on those. I'd prefer to have each expression for each paramater within it's own statement, rather than being all in one, thanks.
Here is a sample how you can go about that. It converts one line into a dictionary.
This will capture any string consisting of a-z-characters (case-insensitive) as the attribute name, and then catch any character other than " in the value string. (If " can occur in the string as "" you need to add some treatment for that.)
Imports System.Text.RegularExpressions
[...]
Dim s As String =
"identifier=""121"" messagecount=""112358"" " &
"timestamp=""11:31:41.622"" column=""5"" row=""98"" " &
"colour=""ORANGE"" value=""Hello"""
Dim d As New Dictionary(Of String, String)
Dim rx As New Regex("([a-z]+)=""(.*?)""", RegexOptions.IgnoreCase)
Dim rxM As MatchCollection = rx.Matches(s)
For Each M As Match In rxM
d.Add(M.Groups(1).Value, M.Groups(2).Value)
Next
' Dictionary is ready
' test output
For Each k As String In d.Keys
MsgBox(String.Format("{0} => {1}", k, d(k)))
Next
You just need to split the data into manageable clumps, and then go through it. Something like this to start you off.
Private Sub ProcessMyData(LineOfData As String)
' NOTE! This assumes all your 'names' have no spaces in!
Dim vElements = LineOfData.Split({" "c}, StringSplitOptions.RemoveEmptyEntries)
For Each vElement In vElements
Dim vPair = vElement.Split({"="c})
Dim vResult = vPair(1).Trim(Convert.ToChar(34))
Select Case vPair(0).ToLower
Case "identifier"
MyIDVariable = CInt(vResult)
Case "colour"
MyColourVariable = vResult
' etc., etc.
End Select
Next
End Sub
You can define the variables you want locally in the sub [function], and then return a list/dictionary/custom class of the things you're interested in.

Find word with RegExp and bold

I've a word document where I want to find all the words as have the following layout: ABC-12:123456 DEF. Where this is found in the document the word should be selected and put in bold. (Later i'll add a hyperlink instead of bold). I have successfully found the word and put it in a MatchCollection just to try RegExp. It looks like:
Sub searchDocument()
Set matchPattern = New RegExp
matchPattern.Pattern = "ABC-\d{2}:\d{6} DEF"
matchPattern.Global = True
Dim matchPatternWords As MatchCollection
Set matchPatternWords = matchPattern.Execute(ActiveDocument.Range)
For Each matchPatternWord In matchPatternWords
MsgBox (matchPatternWord)
Next matchPatternWord
End Sub
You need to go from the regexp match to the range object representing the match.
matchRange = ActiveDocument.Range
(matchPatternWord.FirstIndex, matchPatternWord.FirstIndex+matchPatternWord.Length)
would be the obvious invocation.
However this post indicates that there might be issues with this approach, because formating can mess up the character count. It's from 2010 though so the issue might be resolved in a better way now.
If the above doesn't work, or if you don't trust it you can do;
matchRange = ActiveDocument.Range.Find(FindText:=matchPatternWord.Value)
The latter needs a bit more handeling if multiple occurences of the same word is a possibility.
Once you have the range it's straight forward.
matchRange.Bold = True

How do you use RegEx to return a parsed value?

I have a data column that has a heading value with multiple levels, where I only want the first three levels, but I cannot figure out how to get the parsed value?
I was reading this and it shows how to use create a function to return a boolean for the condition, but how would I create a function that would return a parsed value?
This is the Regular Expression that I think I need.
^(\d.\d.\d)
I'm looking for something that would change 1.2.3.4.5. to 1.2.3 and similar for any other header I have that has more than three levels.
Ideally, I'd like to be able to put it into my Query Design as a Field Expression, but I'm not sure how I would do that.
I assumed your input values could have more than one digit between the dots. In other words, I think you want this ...
? RegExpGetMatch("1.2.3.4.5.", "^(\d+\.\d+\.\d+).*", 1)
1.2.3
? RegExpGetMatch("1.27.3.4.5.", "^(\d+\.\d+\.\d+).*", 1)
1.27.3
If that is the correct behavior, here is the function I used.
Public Function RegExpGetMatch(ByVal pSource As String, _
ByVal pPattern As String, _
ByVal pGroup As Long) As String
'requires reference to Microsoft VBScript Regular Expressions
'Dim re As RegExp
'Set re = New RegExp
'late binding; no reference needed
Dim re As Object
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.Pattern = pPattern
RegExpGetMatch = re.Replace(pSource, "$" & pGroup)
Set re = Nothing
End Function
See also this answer by KazJaw. His answer taught me how to select the match group with RegExp.Replace.
In a query run within an Access session, you could use the function like this:
SELECT
RegExpGetMatch([Data Column], "^(\d+\.\d+\.\d+).*", 1) AS parsed_value
FROM YourTable;
Note however a custom VBA function is not usable for queries run from outside an Access session.
Try changing your RegEx to ^(\d\.\d\.\d). You need to escape the . since it has a special meaning in RegExp.