From strings that are similar to this string:
|cff00ccffkey:|r value
I need to remove |cff00ccff and |r to get:
key: value
The problem is that |cff00ccff is a color code. I know it always starts with |c but the next 8 characters could be anything. So I need a gsub pattern to get the next 8 characters (alpha-numeric only) after |c.
How can I do this in Lua? I have tried:
local newString = string.gsub("|cff00ccffkey:|r value", "|c%w*", "")
newString = string.gsub(newString, "|r", "")
but that will remove everything up to the first white-space and I don't know how to specify the max characters to select to avoid this.
Thank you.
Lua patterns do not support range/interval/limiting quantifiers.
You may repeat %w alphanumeric pattern eight times:
local newString = string.gsub("|cff00ccffkey:|r value", "|c%w%w%w%w%w%w%w%w", "")
newString = string.gsub(newString, "|r", "")
print(newString)
-- => key: value
See the Lua demo online.
You may also make it a bit more dynamic if you build the pattern like ('%w'):.rep(8):
local newString = string.gsub("|cff00ccffkey:|r value", "|c" ..('%w'):rep(8), "")
See another Lua demo.
If your strings always follow this pattern - |c<8alpnum_chars><text>|r<value> - you may also use a pattern like
local newString = string.gsub("|cff00ccffkey:|r value", "^|c" ..('%w'):rep(8) .. "(.-)|r(.*)", "%1%2")
See this Lua demo
Here, the pattern matches:
^ - start of string
|c - a literal |c
" ..('%w'):rep(8) .. " - 8 alphanumeric chars
(.-) - Group 1: any 0+ chars, as few as possible
|r - a |r substring
(.*) - Group 2: the rest of the string.
The %1 and %2 refer to the values captured into corresponding groups.
Related
I was trying to replace/remove any string between - <branch prefix> /
Example:
String name = Application-2.0.2-bug/TEST-1.0.0.zip
expected output :
Application-2.0.2-TEST-1.0.0.zip
I tried the below regex, but it's not working accurate.
String FILENAME = Application-2.0.2-bug/TEST-1.0.0.zip
println(FILENAME.replaceAll(".+/", ""))
You can use
FILENAME.replaceAll("-[^-/]+/", "-")
See the regex demo. Details:
- - a hyphen
[^-/]+ - any one or more chars other than - and /
/ - a / char.
See the online Groovy demo:
String FILENAME = 'Application-2.0.2-bug/TEST-1.0.0.zip'
println(FILENAME.replaceAll("-[^-/]+/", "-"))
// => Application-2.0.2-TEST-1.0.0.zip
I find that using groovy closures for string replaces are most intuitive and easy to understand.
def str = "Application-2.0.2-bug/TEST-1.0.0.zip"
def newStr = str.replaceAll(/(.*-)(.*\/)(.*)/){all,group1,group2,group3 ->
println all
println group1
println group2
println group3
"${group1}${group3}" //this is the return value of the closure
}
println newStr
This is the output
Application-2.0.2-bug/TEST-1.0.0.zip
Application-2.0.2-
bug/
TEST-1.0.0.zip
Application-2.0.2-TEST-1.0.0.zip
Explanation:
If you notice in the regex that char groups are all in parentheses (). This denotes the groups in the input string. These groups can then be used in an easy way in a closure.
all - first variable will always be full string
group1 - (.*-) to indicate all chars ending with -
group2 - (.*\/) to indicate all chars ending with / (escaped with \).
group3 - (.*) all remaining chars
Now for your requirement all you need is to eliminate group2 and return a concatenation of group1 and group3.
By using this technique you can use the closure pretty powerfully, just make sure that the number of arguments in the closure (in this case 4) equal 1 more than the number of groups in the regex since the first one is always full input string. You can dynamically have any number of groups depending on your scenario
Please, try this one:
String FILENAME = "Application-2.0.2-**bug/**TEST-1.0.0.zip";
System.out.println(FILENAME.replaceAll("\\*\\*(.*)\\*\\*", ""));
I'm reading COM port results using a vb6 application, and I need to replace some characters, using regex expressions.
The issue is primarily this: I'm getting a lot of unnecessary characters between the "R" and "|" characters, which I'd like to remove. For this, I'm using the replace function and regex expressions, but it's not working.
This is the code I've written in vb6:
objReg.Pattern = "R.*\|"
objReg.Global = True
x$ = objReg.Replace(Text1.Text, "R|")
Input Stream:
RDA
3|4|
which is ("R" + ETB + "DA" + STX + "3|4|")
Expected Result:
R|4|
Any help in this regard would be much appreciated, thanks!
You may use
objReg.Pattern = "R[^|]+\|"
x$ = objReg.Replace(Text1.Text, "R|")
See the regex demo
The regex will match R, then one or more chars other than | (with the [^|]+ pattern) and then a literal | char. The whole match will be replaced with R|.
You may also use capturing groups with backreferences here if you need to make any more additions to the pattern:
objReg.Pattern = "(R)[^|]+(\|)"
x$ = objReg.Replace(Text1.Text, "$1$2")
The (R) group will correspond to the $1 backreference and (\|) will correspond to $2.
See another regex demo.
I am writing an Excel Add In to read a text file, extract values and write them to an Excel file. I need to split a line, delimited by one or more white spaces and store it in the form of array, from which I want to extract desired values.
I am trying to implement something like this:
arrStr = Split(line, "/^\s*/")
But the editor is throwing an error while compiling.
How can I do what I want?
If you are looking for the Regular Expressions route, then you could do something like this:
Dim line As String, arrStr, i As Long
line = "This is a test"
With New RegExp
.Pattern = "\S+"
.Global = True
If .test(line) Then
With .Execute(line)
ReDim arrStr(.Count - 1)
For i = 0 To .Count - 1
arrStr(i) = .Item(i)
Next
End With
End If
End With
IMPORTANT: You will need to create a reference to:
Microsoft VBScript Regular Expressions 5.5 in Tools > References
Otherwise, you can see Late Binding below
Your original implementation of your original pattern \^S*\$ had some issues:
S* was actually matching a literal uppercase S, not the whitespace character you were looking for - because it was not escaped.
Even if it was escaped, you would have matched every string that you used because of your quantifier: * means to match zero or more of \S. You were probably looking for the + quantifier (one or more of).
You were good for making it greedy (not using *?) since you were wanting to consume as much as possible.
The Pattern I used: (\S+) is placed in a capturing group (...) that will capture all cases of \S+ (all characters that are NOT a white space, + one or more times.
I also used the .Global so you will continue matching after the first match.
Once you have captured all your words, you can then loop through the match collection and place them into an array.
Late Binding:
Dim line As String, arrStr, i As Long
line = "This is a test"
With CreateObject("VBScript.RegExp")
.Pattern = "\S+"
.Global = True
If .test(line) Then
With .Execute(line)
ReDim arrStr(.Count - 1)
For i = 0 To .Count - 1
arrStr(i) = .Item(i)
Next
End With
End If
End With
Miscellaneous Notes
I would have advised just to use Split(), but you stated that there were cases where more than one consecutive space may have been an issue. If this wasn't the case, you wouldn't need regex at all, something like:
arrStr = Split(line)
Would have split on every occurance of a space
What I'm trying to accomplish
I'm trying to create a function to use string interpolation within VBA. The issue I'm having is that I'm not sure how to replace "\n" with a vbNewLine, as long as it does not have the escape character "\" before it?
What I have found and tried
VBScript does not have a negative look behind as far as I can research.
Below has two examples of Patterns that I have already tried:
Private Sub testingInjectFunction()
Dim dict As New Scripting.Dictionary
dict("test") = "Line"
Debug.Print Inject("${test}1\n${test}2 & link: C:\\notes.txt", dict)
End Sub
Public Function Inject(ByVal source As String, dict As Scripting.Dictionary) As String
Inject = source
Dim regEx As Object
Set regEx = CreateObject("VBScript.RegExp")
regEx.Global = True
' PATTERN # 1 REPLACES ALL '\n'
'regEx.Pattern = "\\n"
' PATTERN # 2 REPLACES EXTRA CHARACTER AS LONG AS IT IS NOT '\'
regEx.Pattern = "[^\\]\\n"
' REGEX REPLACE
Inject = regEx.Replace(Inject, vbNewLine)
' REPLACE ALL '${dICT.KEYS(index)}' WITH 'dICT.ITEMS(index)' VALUES
Dim index As Integer
For index = 0 To dict.Count - 1
Inject = Replace(Inject, "${" & dict.Keys(index) & "}", dict.Items(index))
Next index
End Function
Desired result
Line1
Line2 & link: C:\notes.txt
Result for Pattern # 1: (Replaces when not wanted)
Line1
Line2 & link: C:\
otes.txt
Result for Pattern # 2: (Replaces the 1 in 'Line1')
Line
Line2 & link: C:\\notes.txt
Summary question
I can easily write code that doesn't use Regular Expressions that can achieve my desired goal but want to see if there is a way with Regular Expressions in VBA.
How can I use Regular Expressions in VBA to Replace "\n" with a vbNewLine, as long as it does not have the escape character "\" before it?
Yes, you may use a regex here. Since the backslash is not used to escape itself in these strings, you may modify your solution like this:
regEx.Pattern = "(^|[^\\])\\n"
S = regEx.Replace(S, "$1" & vbNewLine)
It will match and capture any char but \ before \n and then will put it back with the $1 placeholder. As there is a chance that \n appears at the start of the string, ^ - the start of string anchor - is added as an alternative into the capturing group.
Pattern details
(^|[^\\]) - Capturing group 1: start of string (^) or (|) any char but a backslash ([^\\])
\\ - a backslash
n - a n char.
[15-]
[41-(32)]
[48-(45)]
[70-15]
[40-(64)]
[(128)-42]
[(128)-56]
I have these values for which I want to extract the value not in curled brackets. If there is more than one, then add them together.
What is the regular expression to do this?
So the solution would look like this:
[15-] -> 15
[41-(32)] -> 41
[48-(45)] -> 48
[70-15] -> 85
[40-(64)] -> 40
[(128)-42] -> 42
[(128)-56] -> 56
You would be over complicating if you go for a regex approach (in this case, at least), also, regular expressions does not support mathematical operations, as pointed out by #richardtallent.
You can use an approach as shown here to extract a substring which omits the initial and final square brackets, and then, use the Split (as shown here) and split the string in two using the dash sign. Lastly, use the Instr function (as shown here) to see if any of the substrings that the split yielded contains a bracket.
If any of the substrings contain a bracket, then, they are omitted from the addition, or they are added up if otherwise.
Regular expressions does not support performing math on the terms. You can loop through the groups that are matched and perform the math outside of Regex.
Here's the pattern to extract any number within the square brackets that are not in cury brackets:
\[
(?:(?:\d+|\([^\)]*\))-)*
(\d+)
(?:-[^\]]*)*
\]
Each number will be returned in $1.
This works by looking for a number that is prefixed by any number of "words" separated by dashes, where the "words" are either numbers themselves or parenthesized strings, and followed by, optionally, a dash and some other stuff before hitting the end brace.
If VBA's RegEx doesn't support uncaptured groups (?:), remove all of the ?:'s and your captured numbers will be in $3 instead.
A simpler pattern also works:
\[
(?:[^\]]*-)*
(\d+)
(?:-[^\]]*)*
\]
This simply looks for numbers delimited by dashes and allowing for the number to be at the beginning or end.
Private Sub regEx()
Dim RegexObj As New VBScript_RegExp_55.RegExp
RegexObj.Pattern = "\[(\(?[0-9]*?\)?)-(\(?[0-9]*?\)?)\]"
Dim str As String
str = "[15-]"
Dim Match As Object
Set Match = RegexObj.Execute(str)
Dim result As Integer
Dim value1 As Integer
Dim value2 As Integer
If Not InStr(1, Match.Item(0).submatches.Item(0), "(", 1) Then
value1 = Match.Item(0).submatches.Item(0)
End If
If Not InStr(1, Match.Item(0).submatches.Item(1), "(", 1) And Not Match.Item(0).submatches.Item(1) = "" Then
value2 = Match.Item(0).submatches.Item(1)
End If
result = value1 + value2
MsgBox (result)
End Sub
Fill [15-] with the other strings.
Ok! It's been 6 years and 6 months since the question was posted. Still, for anyone looking for something like that maybe now or in the future...
Step 1:
Trim Leading and Trailing Spaces, if any
Step 2:
Find/Search:
\]|\[|\(.*\)
Replace With:
<Leave this field Empty>
Step 3:
Trim Leading and Trailing Spaces, if any
Step 4:
Find/Search:
^-|-$
Replace With:
<Leave this field Empty>
Step 5:
Find/Search:
-
Replace With:
\+