Regular expression matching more than I need it to - regex

I'm having some trouble trying to develop a regular expression which will pick out all the function calls to "tr" from this block of asp code below. Specifically I need to get the string in each "tr" function call.
if(RS.Fields("Audid").Value <> 0 ) Then
Response.Write ("<td>" & tr("RA Assigned") & "</td>")
else
Response.Write ("<td>" & tr("Not Yet Assigned") & "</td>")
End if
if(RS.Fields("rStatus").Value = "Activated") then
Response.Write("<td><A HRef='portal_setup_billingII.asp?OrderPId=" & RS.Fields("CustomerParid").Value & "&OrderId=" & RS.Fields("OrderId").Value & "'>" & tr("Edit") &"</A></td></TR>")
Else
If (gParLevelz_Admin = gParLevelz and RS.Fields("CustomerParid").Value <> 0) Then
Response.Write("<td><A HRef='portal_setup_billingII.asp?OrderPId=" & RS.Fields("CustomerParid").Value & "&OrderId=" & RS.Fields("OrderId").Value & "'>" & tr("Awaiting Authorization") & "</A></td></TR>")
else
Response.Write("<td>" & tr("Awaiting Authorization") & "</td></TR>")
End if
End if
I believe I have a good first attempt at getting this done. The following expression extracts values for most of the cases I will run into...
tr\(\"([^%]|%[0-9]+)+\"\)
What's causing me the most confusion and stress is how to capture all manner of strings which show up in the "tr" function. Literally anything could be between the quotation marks of the "tr" call and unfortunately my expression returns values past that last quotation. So given the above snippet which I have posted one of the matches is...
tr("RA Assigned %2") & "</td>")
else
Response.Write ("<td>" & tr("Not Yet Assigned %4") & "</td>")
End if
if(RS.Fields("rStatus").Value = "Activated") then
Response.Write("<td><A HRef='portal_setup_billingII.asp?OrderPId=" & RS.Fields("CustomerParid").Value & "&OrderId=" & RS.Fields("OrderId").Value & "'>" & tr("Edit") &"</A></td></TR>")
Else
If (gParLevelz_Admin = gParLevelz and RS.Fields("CustomerParid").Value <> 0) Then
Response.Write("<td><A HRef='portal_setup_billingII.asp?OrderPId=" & RS.Fields("CustomerParid").Value & "&OrderId=" & RS.Fields("OrderId").Value & "'>" & tr("Awaiting Authorization") & "</A></td></TR>")
else
Response.Write("<td>" & tr("Awaiting Authorization") & "</td></TR>")
Which is way more than I want. I just want tr("RA Assigned %2") to be returned.

It looks like your regex pattern is greedy. Try making it non-greedy by adding an ? after the 2nd +: tr\(\"([^%]|%[0-9]+)+?\"\)
A simplified version to capture anything inside the tr(...) would be: tr\(\"(.+?)\"\)

Use a question mark after the plus sign modifier to make it non-greedy (only match as much as it needs).
Also, maybe anchor against ") & " if that always follows a call to tr().

You'll need a non-greedy pattern; just add a ?, like:
tr\(\"([^%]|%[0-9]+)+?\"\)
// ^--- notice this

tr\((\"[^\"]*)\"\)

tr(\".*\")
in regex, . = anything, * = any number (including 0)

Just don't match on the equals sign for the string.
tr\(\"([^\"]+)\"\)

I'm not sure if it's perfect, but it properly retrieved all of the entries in your sample. While testing the other expressions on this page I found that some erroneous entries were being returned. This one does not return any bad data:
tr\("([\W\w\s]+?)"\)
The result returned will contain both the entire function call, and also the strings within the function. I tested it with the following input:
Response.Write ("<td>" & tr("RA Assigned") & "</td>")
Response.Write ("<td>" & tr("Not Yet Assigned") & "</td>")
Response.Write("<td><A HRef='portal_setup_billingII.asp?OrderPId=" & RS.Fields("CustomerParid").Value & "&OrderId=" & RS.Fields("OrderId").Value & "'>" & tr("Edit") &"</A></td></TR>")
Response.Write("<td><A HRef='portal_setup_billingII.asp?OrderPId=" & RS.Fields("CustomerParid").Value & "&OrderId=" & RS.Fields("OrderId").Value & "'>" & tr("Awaiting Authorization") & "</A></td></TR>")
Response.Write("<td>" & tr("Awaiting Authorization") & "</td></TR>")
Response.Write ("<td>" & tr("RA Ass14151igned") & "</td>")
Response.Write ("<td>" & tr("RA %Ass_!igned") & "</td>")
And received the following output:
$matches Array:
(
[0] => Array
(
[0] => tr("RA Assigned")
[1] => tr("Not Yet Assigned")
[2] => tr("Edit")
[3] => tr("Awaiting Authorization")
[4] => tr("Awaiting Authorization")
[5] => tr("RA Ass14151igned")
[6] => tr("RA %Ass_!igned")
)
[1] => Array
(
[0] => RA Assigned
[1] => Not Yet Assigned
[2] => Edit
[3] => Awaiting Authorization
[4] => Awaiting Authorization
[5] => RA Ass14151igned
[6] => RA %Ass_!igned
)
)
On a related note, check out My Regex Tester. It's a super useful tool for testing regular expressions in your browser.

This should do it, use non-greedy (?) after * or +:
const string pattern = "tr\\(\".*?\"\\)";
const string text = "tr(\"RA Assigned %2\") & \"</td>\")";
Regex r = new Regex(pattern, RegexOptions.Compiled);
Match m = r.Match(text);
while (m.Success)
{
foreach (Capture c in m.Captures)
{
Console.WriteLine(c.Value);
}
m = m.NextMatch();
}
(Here there is a good regex in C# cheat sheet)

Related

Pattern Matching Scala Regex evaluation

Imagine you have String that contains the Ampersand Symbol &
my goal is to add spaces between the & and any character if there isn't any
e.x
Case 1: Body&Soul should be-->Body & Soul (working)
Case 2: Body &Soul--> Body & Soul (working)
Case 3: Body& Soul -->Body & Soul (working)
Case 4: Body&Soul&Mind -->Body & Soul & Mind (working)
Case 5: Body &Soul& Mind ---> Body & Soul & Mind (not working)
Case 6: Body& Soul &Mind ---> Body & Soul & Mind (not working)
def replaceEmployerNameContainingAmpersand(emplName: String): String
= {
val r = "(?<! )&(?! )".r.unanchored
val r2 = "&(?! )".r.unanchored
val r3 = "(?<! )&".r.unanchored
emplName match {
case r() => emplName.replaceAll("(?<! )&(?! )", " & ")
case r2() => emplName.replaceAll("&(?! )", "& ")
case r3() => emplName.replaceAll("(?<! )&", " &")
}
}
The goal is to fix Case 5 & 6: Body &Soul& Mind or Body& Soul &Mind --> Body & Soul & Mind
But it's not working because when case 2 or 3 occurs the case is exiting and not matching the second & symbol.
Can anyone help me on how to match case 5 and 6?
You may capture a single optional whitespace char on both ends of a & and check if they matched, and replace accordingly using replaceAllIn:
def replaceAllIn(target: CharSequence, replacer: (Match) => String): String
Replaces all matches using a replacer function.
See the Scala demo:
val s = "Body&Soul, Body &Soul, Body& Soul, Body&Soul&Mind, Body &Soul& Mind, Body& Soul &Mind"
val pattern = """(\s)?&(\s)?""".r
val res = pattern.replaceAllIn(s, m => (if (m.group(1) != null) m.group(1) else " ") + "&" + (if (m.group(2) != null) m.group(2) else " ") )
println(res)
// => Body & Soul, Body & Soul, Body & Soul, Body & Soul & Mind, Body & Soul & Mind, Body & Soul & Mind
The (\s)?&(\s)? pattern matches and captures into Group 1 a single whitespace char, then matches &, and then captures an optional whitespace in Group 2.
If Group 1 is not null, there is a whitespace, and we keep it, else, replace with a space. The same logic is used for the trailing space.

Regular Expressions Finding A Set of Numbers

I am stumped on trying to figure out regular expressions so I thought I would ask the big dogs.
I have a string that can range from 1-4 sets as follows:
1234-abcd, baa74739, maps21342, 6789
Now I have figured out the regular expressions for the 1234-abcd, baa74739, and maps21342. However, I am having trouble figuring out a code to pull the numbers that stand alone. Does anyone have an opinion on a way around this?
Example of the regex I used:
dbout.Range("D7").Formula = "=RegexExtract(DH7," & Chr(34) & "([M][A][P][S]\d+)" & Chr(34) & ")"
dbout.Range("D7").AutoFill Destination:=dbout.Range("D7:D2000")
for digit stand alone replace
dbout.Range("D7").Formula = "=RegexExtract(DH7," & Chr(34) & "([M][A][P][S]\d+)" & Chr(34) & ")"
dbout.Range("D7").AutoFill Destination:=dbout.Range("D7:D2000")
with
dbout.Range("D7").Formula = "=RegexExtract(DH7," & Chr(34) & "(\b\d+\b)" & Chr(34) & ")"
dbout.Range("D7").AutoFill Destination:=dbout.Range("D7:D2000")
OR
dbout.Range("D7").Formula = "=RegexExtract(DH7,""(\b\d+\b)"")"
dbout.Range("D7").AutoFill Destination:=dbout.Range("D7:D2000")

Extract numeric info from text

I need to extract numeric info from text.
Ready
State: CTYG Work Request #: 2880087 General
Job Address
Contact
Work Request Search
My code :
$Text = WinGetText("[ACTIVE]")
Sleep(4000)
$Value = StringSplit($Text, #CRLF)
MsgBox(0, "Hello", $Value, 10) ;---1st message box
Sleep(4000)
For $i = 1 To $Value[0]
If StringRegExp($Value[$i], "[0-9][^:alpha:]") Then
MsgBox(0, "Hello1", $Value[$i], 5) ;---2nd message box
Sleep(200)
$newWR = $Value[$i]
MsgBox(0, "Hello2", $newWR, 10)
ConsoleWrite($newWR) ;---3rd message box
EndIf
Next
1st MsgBox() shows nothing. The 2nd and 3rd show State: CTYG Work Request #: 2880087 General. But I don't need the entire line, I just want 2880087.
What about this? This will delete everything but numbers.
$str = "State: CTYG Work Request #: 2880087 General"
ConsoleWrite(StringRegExpReplace($str, '\D', '') & #CRLF)
… i just want 2880087 …
Example using regular expression State: .+ #: (\d+) :
#include <StringConstants.au3>; StringRegExp()
#include <Array.au3>
Global Const $g_sText = 'Ready' & #CRLF & #CRLF _
& 'State: CTYG Work Request #: 2880087 General' & #CRLF & #CRLF _
& 'Job Address' & #CRLF & #CRLF _
& 'Contact' & #CRLF & #CRLF _
& 'Work Request Search'
Global Const $g_sRegEx = 'State: .+ #: (\d+)'
Global Const $g_aResult = StringRegExp($g_sText, $g_sRegEx, $STR_REGEXPARRAYMATCH)
ConsoleWrite($g_sText & #CRLF)
_ArrayDisplay($g_aResult)
Stores 2880087 to $g_aResult[0].

regex to replace captured group, want to filter by linestart

(notice the newline before Bob)
my_string = "Alice & 1 & a \nBob & 2 & b"
gsub("(?m)(?<=& )(.+?)","(\\1)", my_string, perl=TRUE)
> "Alice & (1) & (a) \nBob & (2) & (b)"
How do I adjust the regex to only parenthesise the entries in the line that starts with Alice?
All variations of ^A that I've tried either capture Alice itself or only capture the first occurence of the group after Alice.
edit: expected output
"Alice & (1) & (a) \nBob & 2 & b"
Use (*SKIP)(*F)
gsub("^(?!Alice\\b).*(*SKIP)(*F)|(?<=& )(\\S+)", "(\\1)", s, perl=T)
DEMO
Now sure how efficient this is, but you can always apply it to a vector containing a separate entry for each line
l <- strsplit(my_string, "\n")[[1]]
paste(ifelse(substr(l, 1, 5) == "Alice", gsub("(?<=& )(.+?)(?m)","(\\1)", l, perl=TRUE), l), collapse = "\n")
# [1] "Alice & (1) & (a) \nBob & 2 & b"

Regex to match data from a webpage

This is probably a simple question for someone experienced with regex, but I'm having a little trouble. I'm looking to match lines of data like this shown below:
SomeAlpha Text CrLf CrLf 15 CrLf CrLf 123 132 143 CrLf CrLf 12313 CrLf CrLf 12/123
Where the "SomeAlpha Text" is just some text with space and potentially punctuation. The first number is something between 1 and 30,000. The second set of numbers (123 132 143) are between 1 and 500,000 (each number). The next number is somewhere between 1 and 500,000. The final set is (1–30,000)/(1–30,000). This is the code I've put together so far:
Dim Pattern As String = "[.*]{1,100}" & vbCrLf & "" & vbCrLf & "[0-9]{1,4}" & vbCrLf & "" & vbCrLf & "[0-9]{1,6] [0-9]{1,6] [0-9]{1,6]" & vbCrLf & "" & vbCrLf & "[0-9]{1,6}" & vbCrLf & "" & vbCrLf & "[0-9]{1,5}/[0-9]{1,5}"
For Each match As Match In Regex.Matches(WebBrowser1.DocumentText.ToString, Pattern, RegexOptions.IgnoreCase)
RichTextBox1.AppendText(match.ToString & Chr(13) & Chr(13))
Next
And I'm currently getting 0 matches, even though I know there should be at least 1 match. Any advice on where my pattern is wrong would be great! Thanks.
"[.*]{1,100}" & vbCrLf & "" & vbCrLf & "[0-9]{1,4}" & vbCrLf & "" & vbCrLf & "[0-9]{1,6] [0-9]{1,6] [0-9]{1,6]" & vbCrLf & "" & vbCrLf & "[0-9]{1,6}" & vbCrLf & "" & vbCrLf & "[0-9]{1,5}/[0-9]{1,5}"
has quite a few problems:
The * in "[.*]{1,100}" tells the previous character to repeat as many times as possible, and is therefore unnecessary. Replace it with ".{1,100}" or ".*"
You say the first number is between 0 and 30000. "[0-9]{1,4}" only allows for 4 digits (0 to 9999). Replace it with "[0-9]{1,5}", which allows for any number between 0 and 99999.
You accidentally put ] instead of } at three places in this part: "[0-9]{1,6] [0-9]{1,6] [0-9]{1,6]". Replace it with "[0-9]{1,6} [0-9]{1,6} [0-9]{1,6}"
Try doing what I said above. It should work correctly.