Regex match character after lazy grouping - regex

I want to match specifically the comma "," after the two groups ( AS) and (.*?).
I have a positive lookbehind that skips the AS but I cant get the grouping to skip the wildcard lazy group.
Regex:
(?<= AS)(.*?)(,)
Sample text
SELECT LEFT(CustomerCode, 5) AS SMSiteCode, SUBSTRING(CustomerCode, 6, LEN(CustomerCode) - 5) AS SMCustCode, SUBSTRING(AgreeNo, 6, LEN(AgreeNo) - 5)
AS SMAgreeNo, CAST(SeqNo AS int) AS SeqNo, SUBSTRING(TrxDate, 7, 2) + SUBSTRING(TrxDate, 4, 2) + SUBSTRING(TrxDate, 1, 2) AS TrxDate, TrxTime,
CAST(Charge AS bit) AS Charge, CASE WHEN LEN(AnalysisCode) > 5 THEN SUBSTRING(AnalysisCode, 6, LEN(AnalysisCode) - 5)
ELSE AnalysisCode END AS AnalysisCode, CAST(ISNULL(Description, N'') AS nvarchar(100)) AS Description, CAST(TaxAmt AS money) AS TaxAmt,
CAST(TotAmt AS money) AS TotAmt, CAST(Match AS bigint) AS Match, CAST(Confirmed AS bit) AS Confirmed, CAST(Balance AS money) AS Balance,
CAST(QtyBal AS money) AS QtyBal, CAST(ISNULL(Drawer, N'') AS nvarchar(50)) AS Drawer, SUBSTRING(DateBanked, 7, 2) + SUBSTRING(DateBanked, 4, 2)
+ SUBSTRING(DateBanked, 1, 2) AS DateBanked, CAST(ISNULL(BankBranch, N'') AS nvarchar(50)) AS BankBranch, CAST(Qty AS float) AS Qty, CAST(ISNULL(Narration,
N'') AS nvarchar(100)) AS Narration, SUBSTRING(DateFrom, 7, 2) + SUBSTRING(DateFrom, 4, 2) + SUBSTRING(DateFrom, 1, 2) AS DateFrom, SUBSTRING(DateTo, 7, 2)
+ SUBSTRING(DateTo, 4, 2) + SUBSTRING(DateTo, 1, 2) AS DateTo, CAST(PrintNarration AS bit) AS PrintNarration, CAST(DiscAmt AS float) AS DiscAmt,
CAST(ISNULL(CCAuthNo, N'') AS nvarchar(20)) AS CCAuthNo, CAST(ISNULL(CCTransID, N'') AS nvarchar(20)) AS CCTransID, CAST(UserLogin AS nvarchar(20))
AS UserLogin, CAST(Reconciled AS bit) AS Reconciled, SUBSTRING(DateReconciled, 7, 2) + SUBSTRING(DateReconciled, 4, 2) + SUBSTRING(DateReconciled, 1, 2)
AS DateReconciled, CAST(PrimaryKey AS bigint) AS PrimaryKey, SUBSTRING(InvDate, 7, 2) + SUBSTRING(InvDate, 4, 2) + SUBSTRING(InvDate, 1, 2) AS InvDate,
CAST(InvNo AS int) AS InvNo FROM SomeDatabase.dbo.tblTransaction WHERE IsDate(trxTime) = 1

You could try \K, but make sure to change Javescript in RegExr from top right of the screen to PCRE.
\K is defined as:
Sets the given position in the regex as the new "start" of the match. This means that nothing preceding the K will be captured in the overall match.
With \K, you could try something like this:
(?<= AS).*?\K(,)
Example: https://regex101.com/r/X3AdbH/1/

If \K is supported, you could get your matches without using a lookbehind and a capturing group by matching AS and use a negated character class to match any char except a comma.
AS [^,]+\K,
Explanation
AS Match space, AS and space
[^,]+ Match 1+ times any char except a comma
\K, Forget what was matched and match a comma
Regex demo

I'm guessing that your expression is just fine, you maybe want to limit the first capturing group to some specific chars, if you wish, maybe looking like:
(?<= AS)([A-Za-z\d\s]+)(,)
The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.

Related

How to extract the operands on both sides of "==" using regex?

Language and package
python3.8, regex
Description
The inputs and wanted outputs are listed as following:
if (programWorkflowState.getTerminal(1, 2) == Boolean.TRUE) {
Want: programWorkflowState.getTerminal(1, 2) and Boolean.TRUE
boolean ignore = !_isInStatic.isEmpty() && (_isInStatic.peek() == 3) && isAnonymous;
Want: _isInStatic.peek() and 3
boolean b = (num1 * ( 2 + num2)) == value;
Want: (num1 * ( 2 + num2)) and value
My current regex
((?:\((?:[^\(\)]|(?R))*\)|[\w\.])+)\s*==\s*((?:\((?:[^\(\)]|(?R))*\)|[\w\.])+)
This pattern want to match \((?:[^\(\)]|(?R))*\) or [\w\.] on both side of "=="
Result on regex101.com
Problem: It failed to match the recursive part (num1 * ( 2 + num2)).
The explanation of the recursive pattern \((?:m|(?R))*\) is here
But if I only use the recursive pattern, it succeeded to match (num1 * ( 2 + num2)) as the image shows.
What's the right regex to achieve my purpose?
The \((?:m|(?R))*\) pattern contains a (?R) construct (equal to (?0) subroutine) that recurses the entire pattern.
You need to wrap the pattern you need to recurse with a group and use a subroutine instead of (?R) recursion construct, e.g. (?P<aux>\((?:m|(?&aux))*\)) to recurse a pattern inside a longer one.
You can use
((?:(?P<aux1>\((?:[^()]++|(?&aux1))*\))|[\w.])++)\s*[!=]=\s*((?:(?&aux1)|[\w.])+)
See this regex demo (it takes just 6875 steps to match the string provided, yours takes 13680)
Details
((?:(?P<aux1>\((?:[^()]++|(?&aux1))*\))|[\w.])++) - Group 1, matches one or more occurrences (possessively, due to ++, not allowing backtracking into the pattern so that the regex engine could not re-try matching a string in another way if the subsequent patterns fail to match)
(?P<aux1>\((?:[^()]++|(?&aux1))*\)) - an auxiliary group "aux1" that matches (, then zero or more occurrences of either 1+ chars other than ( and ) or the whole Group "aux1" pattern, and then a )
| - or
[\w.] - a letter, digit, underscore or .
\s*[!=]=\s* - != or == with zero or more whitespace on both ends
((?:(?&aux1)|[\w.])+) - Group 2: one or more occurences of Group "aux" pattern or a letter, digit, underscore or ..

Valid regex for number(a,b) format

How can I express number(a,b) in regex? Example:
number(5,2) can be 123.45 but also 2.44
The best I got is: ([0-9]{1,5}.[0-9]{1,2}) but it isn't enough because it wrongly accepts 12345.22.
I thought about doing multiple OR (|) but that can be too long in case of a long format such as number(15,5)
You might use
(?<!\S)(?!(?:[0-9.]*[0-9]){6})[0-9]{1,5}(?:\.[0-9]{1,2})?(?!\S)
Explanation
(?<!\S) Negative lookbehind, assert what is on the left is not a non whitespace char
(?! Negative lookahead, assert what is on the right is not
(?:[0-9.]*[0-9]){6} Match 6 digits
) Close lookahead
[0-9]{1,5} Match 1 - 5 times a digit 0-9
(?:\.[0-9]{1,2})? Optionally match a dot and 1 - 2 digits
(?!\S) Negative lookahead, assert what is on the right is not a non whitespace char
Regex demo
I don't know Scala, but you would need to input those numbers when building your regular expression.
val a = 5
val b = 2
val regex = (raw"\((?=\d{1," + a + raw"}(?:\.0+)?|(?:(?=.{1," + (a + 1) + "}0*)(?:\d+\.\d{1," + n + "})))‌.+\)").r
This checks for either total digits is 5, or 6 (including decimal) where digits after the decimal are a max of 2 digits. For the above scenario. Of course, this accounts for variable numbers for a and b when set in code.

Scala: Tokenizing simple arithmetic expressions

How can I split 23+3*5 or 2 + 3*5 into a list List("23", "+", "3", "*", "5")?.
I tried things like split, splitAt, but nothing with the wished result.
I want that it splits at the arithmetic operators.
Try something like
"2 + 4 - 3 * 5 / 7 / 3".split("(?=[+/*-])|(?<=[+/*-])").map(_.trim)
In this particular example, it gives you:
Array(2, +, 4, -, 3, *, 5, /, 7, /, 3)
The (?= ) are lookaheads, (?<= ) are lookbehinds. Essentially, it cuts the string before and after every operator. Note that - in [+/*-] is at the last position: otherwise it's interpreted as a character range (e.g. [a-z]).
I suggest matching on what you want as tokens.
e.g.
"\\d+|[-+*/]".r.findAllIn(" 23 + 4 * 5 / 7").toList
// List(23, +, 4, *, 5, /, 7)

Extracting inner groups with regex

I have the following string
([Valor][Corr][Fat]: 6M UC x Viz. Lógicos IN('3','6')) AND (((SUM_RevisionAnomalia_UltRevision_1M = 1) AND (CANT_ConsumoFact_UltRevision_1M > 1)) OR ((SUM_RevisionNoAnomalia_UltRevision_1M + 1) AND (CANT_ConsumoFact_UltRevision_1M BETWEEN 1 - 2))) OR (SUM_RevisionNoAnomalia_UltRevision_1M <= 1)
and I am trying to extract all inner groups, so my answer should contain
([Valor][Corr][Fat]: 6M UC x Viz. Lógicos IN('3','6'))
(SUM_RevisionAnomalia_UltRevision_1M = 1)
(CANT_ConsumoFact_UltRevision_1M > 1)
(SUM_RevisionNoAnomalia_UltRevision_1M + 1)
(CANT_ConsumoFact_UltRevision_1M BETWEEN 1 - 2)
(SUM_RevisionNoAnomalia_UltRevision_1M <= 1)
It is quite easy to extract this when there is only 1 set of those strings inside parentheses, but when given the example above my regex captures the whole string.
The regex i am using is
/(\([a-zA-Z0-9\[\]:_+=-\s\.\(\),'óáéíúüçãôàäê><]+\))/g
It seems you just want to match what is in-between ( and ) that is not ( and ) unless these are (...) that are preceded with a word character.
You can use
\((?:[^()]|\b\([^()]*\))*\)
See the regex demo
The regex breakdown:
\( - matching a literal (
(?:[^()]|\b\([^()]*\))* - zero or more sequences of:
[^()] - any character other than ( and )
| - or...
\b\([^()]*\) - a word boundary (i.e. before that position, there must be a word character) followed with ( followed with zero or more characters other than ( and )
\) - a closing )
An alternative pattern can be an unrolled one (more efficient with longer inputs):
\([^()]*(?:\b\([^()]*\)[^()]*)*\)
See another demo

Regex: match and tokenize in Scala

I am trying to extract certain patterns from the input string. These patterns are +, - , *, / , (, ), log , integer and float numbers.
Here's example for the needed behavior:
//input string
var str = "log6*(12+5)/2-34.2"
//wanted result
var rightResp = Array("log","6","*","(","12","+","5",")","/","2","-","34.2")
I have tried to do this for some time but I have to admit that regex is not my specialty. Next piece of code shows where I am stuck:
import scala.util.matching.Regex
var str = "log6*(12+5)/2-34.2"
val pattern = new Regex("(\\+|-|log|\\*|\\/|[0-9]*\.?[0-9]*)")
pattern.findAllIn(str).toArray
Result is not good cause there is no matching for brackets "(" and ")" and also numbers , both integer(6,12,5,2) and float(34.2) are messed up. Thanks for your help!
You can use
[+()*/-]|log|[0-9]*\\.?[0-9]+
See regex demo
The regex contains 3 alternatives joined with the help of | alternation operator.
[+()*/-] - matches a single literal character: +, (, ), *, /, - (note that the hyphen is not escaped as it is at the end of the character class)
log - a literal letter sequence log
[0-9]*\\.?[0-9]+ - a float number that accepts values like .05, 5.55 as it matches...
[0-9]* - 0 or more digits
\\.? - and optional (1 or 0) literal periods
[0-9]+ - 1 or more digitis.
Here is a Scala code sample:
import scala.util.matching.Regex
object Main extends App {
var str = "log6*(12+5)/2-34.2"
val pattern = new Regex("[+()*/-]|log|[0-9]*\\.?[0-9]+")
val res = pattern.findAllIn(str).toArray
println(res.deep.mkString(", "))
}
Result: log, 6, *, (, 12, +, 5, ), /, 2, -, 34.2