How to find any non-digit characters using RegEx in ABAP - regex

I need a Regular Expression to check whether a value contains any other characters than digits between 0 and 9.
I also want to check the length of the value.
The RegEx I´ve made: ^([0-9]\d{6})$
My test value is: 123Z45 and 123456
The ABAP code:
FIND ALL OCCURENCES OF REGEX '^([0-9]\d{6})$' IN L_VALUE RESULTS DATA(LT_RESULTS).
I´m expecting a result in LT_RESULTS, when I´m testing the first test value '123Z45', because there is a non-digit character.
But LT_RESULTS is in nearly every test case empty.

Your expression ^([0-9]\d{6})$ translates to:
^ - start of input
( - begin capture group
[0-9] - a character between 0 and 9
\d{6} - six digits (digit = character between 0 and 9)
) - end capture group
$ - end of input
So it will only match 1234567 (7 digit strings), not 123456, or 123Z45.
If you just need to find a string that contains non digits you could use the following instead: ^\d*[^\d]+\d*$
* - previous element may occur zero, one or more times
[^\d] - ^ right after [ means "NOT", i.e. any character which is not a digit
+ - previous element may occur one or more times
Example:
const expression = /^\d*[^\d]+\d*$/;
const inputs = ['123Z45', '123456', 'abc', 'a21345', '1234f', '142345'];
console.log(inputs.filter(i => expression.test(i)));

You can also use this character class if you want to extract non-digit group:
DATA(l_guid) = '0074162D8EAA549794A4EF38D9553990680B89A1'.
DATA(regx) = '[[:alpha:]]+'.
DATA(substr) = match( val = l_guid
regex = regx
occ = 1 ).
It finds a first occured non-digit group of characters and shows it.
If you want to just check if they are exists or how much of them reside in your string, count built-in function is your friend:
DATA(how_many) = count( val = l_guid regex = regx ).
DATA(yes) = boolc( count( val = l_guid regex = regx ) > 0 ).
Match and count exist since ABAP 7.50.

If you don't need a Regular Expression for something more complex, ABAP has some nice comparison operators CO (Contains Only), CA, NA etc for you. Something like:
IF L_VALUE CO '0123456789' AND STRLEN( L_VALUE ) = 6.

Related

Regex and LINQ extract group by group name

I have a SQL sintax in the form of a string in which there are some parameters written in a standard way "<parameter1>, <parameter2>".
Then i have another string with the parameters values written in a standard way as well: "Parameter1=123; Parameter2=aaa".
I need to match the parameters in the first SQL with the values in the second one.
What I have so far:`
Dim BodySQL = "Blablabal WHERE X=<Parameter1> AND Y=<Parameter2>"
Dim vmp As RegularExpressions.MatchCollection = RegularExpressions.Regex.Matches("Parameter1=2555; Parameter2 = 12/02/2021", "([\w ]+)=([\w ]+)")
Dim vmc As RegularExpressions.MatchCollection = RegularExpressions.Regex.Matches(BodySQL, "(?<=\<).+?(?=\>)")
For Each vm As RegularExpressions.Match In vmc
Dim Vl As String = (From m As RegularExpressions.Match In vmp
Where m.Groups(1).Value.Trim = vm.Value.ToString).Select(Of String)(Function(f) f.Groups(2).Value).ElementAt(0).Trim
BodySQL = BodySQL.Replace(vm.Value, Vl)
Next
It works for the first parameter, but then i get
"System.ArgumentOutOfRangeException: 'Specified argument was out of the range of valid values.
Parameter name: index'"
Can I please ask why?
You can extract the keys and values with one regex from the param=value strings, create a dictionary out of them, and then use Regex.Replace to replace the matches with the dictionary values:
Imports System.Text.RegularExpressions
' ...
Dim BodySQL = "Blablabal WHERE X=<Parameter1> AND Y=<Parameter2>"
Dim args As New Dictionary(Of String, String)(StringComparer.InvariantCultureIgnoreCase)
' (StringComparer.InvariantCultureIgnoreCase) makes the dictionary keys case insensitive
For Each match As Match In Regex.Matches("PARAMETER1=2555; Parameter2 = 12/02/2021", "(\S+)\s*=\s*([^;]*[^;\s])")
args.Add(match.Groups(1).Value, match.Groups(2).Value)
Next
Console.WriteLine(Regex.Replace(BodySQL, "<([^<>]*)>",
Function(match)
Return If(args.ContainsKey(match.Groups(1).Value), args(match.Groups(1).Value), match.Value)
End Function))
Output:
Blablabal WHERE X=2555 AND Y=12/02/2021
The (\S+)\s*=\s*([^;]*[^;\s]) pattern matches
(\S+) - captures into Group 1 any one or more non-whitespace chars (the key value)
\s*=\s* - a = char enclosed with zero or more whitespace chars
([^;]*[^;\s]) - Group 2: any zero or more chars other than ; and then one char other than ; and whitespace (the value, it cannot be empty with this pattern. If you want it to be possible to match empty values, you will need to remove [^;\s] and then use Trim() on the match.Groups(2).Value in the code.)
The <([^<>]*)> regex matches
< - a < char (do not escape this char in any regex flavor please, it is never a special regex metachar)
([^<>]*) - Group 1: any zero or more chars other than < and >
> - a literal > char.
Since the key is in Group 1 and < and > on both ends are consumed, the < and > are removed when replacing with the found value.
zero or more chars other than > and < between < and >.
The error is self explanatory. You are trying to access an array or List and specifying an index value that is either negative or larger than the largest index available.
.ElementAt(0) / m.Groups(1) / f.Groups(2)
My guess is that one of them might go out of bounds. Try to debug it with a breakpoint and check the values of your variables.
This is what i did with your code:
Dim vmc = "\<(.*?)\>"
i changed this regex so that it could also give me the "<>"
Dim BodySQL = "Blablabal WHERE X=<Parameter1> AND Y=<Parameter2>"
Dim args As New Dictionary(Of String, String)
For Each match As Match In Regex.Matches("Parameter1=2555; Parameter2 = 12/02/2021", "(\S+)\s*=\s*(\S[^;]+)")
i changed the regex expression to exclude the ";"
args.Add(match.Groups(1).Value, match.Groups(2).Value)
Next
Console.WriteLine(Regex.Replace(BodySQL, vmc,
Function(match As Match)
Return If(args.ContainsKey(match.Groups(1).Value), args(match.Groups(1).Value), match.Value)
End Function))
And now i have what i needed. Thank you a lot :)
Output:
WHERE X = 2555,Y = 12/02/2021

regex match longest substring with equal first and last char

/(\w)(\w*)\1/
For this string:"mgntdygtxrvxjnwksqhxuxtrv" I match "txrvxjnwksqhxuxt" (using Ruby), but not the even longer valid substring "tdygtxrvxjnwksqhxuxt".
For a given string, here are two ways to find the longest substring that begins and ends with the same character.
Suppose
str = "mgntdygtxrvxjnwksqhxuxtrv"
Use a regular expression
r = /(.)(?=(.*\1))/
str.gsub(r).map { $1 + $2 }.max_by(&:length)
#=> "tdygtxrvxjnwksqhxuxt".
When, as here, the regular expression contains capture groups, it may be more convenient to use String#gsub without a second argument or block (in which case it returns an enumerator, which can be chained) than String#scan (" If the pattern contains groups, each individual result is itself an array containing one entry per group.") Here gsub performs no substitutions; it merely generates matches of the regular expression.
The regular expression can be made self-documenting by writing it in free-spacing mode.
r = /
(.) # match any char and save to capture group 1
(?= # begin a positive lookahead
(.*\1) # match >= 0 characters followed by the contents of capture group 1
) # end the postive lookahead
/x # free-spacing regex definition mode
The following intermediate calculation is performed:
str.gsub(r).map { $1 + $2 }
#=> ["gntdyg", "ntdygtxrvxjn", "tdygtxrvxjnwksqhxuxt", "txrvxjnwksqhxuxt",
# "xrvxjnwksqhxux", "rvxjnwksqhxuxtr", "vxjnwksqhxuxtrv", "xjnwksqhxux",
# "xux"]
Notice that this does not enumerate all substrings beginning and ending with the same character (because .* is greedy). It does not generate, for example, the substring "xrvx".
Do not use a regular expression
v = str.each_char.with_index.with_object({}) do |(c,i),h|
if h.key?(c)
h[c][:size] = i - h[c][:start] + 1
else
h[c] = { start: i, size: 1 }
end
end.max_by { |_,h| h[:size] }.last
str[v[:start], v[:size]]
#=> "tdygtxrvxjnwksqhxuxt"

Extract an 8 digits number from a string with additional conditions

I need to extract a number from a string with several conditions.
It has to start with 1-9, not with 0, and it will have 8 digits. Like 23242526 or 65478932
There will be either an empty space or a text variable before it. Like MMX: 23242526 or bgr65478932
It could have come in rare cases: 23,242,526
It ends with an emty space or a text variable.
Here are several examples:
From RE: Markitwire: 120432889: Mx: 24,693,059 i need to get 24693059
From Automatic reply: Auftrag zur Übertragung IRD Ref-Nr. MMX_23497152 need to get 23497152
From FW: CGMSE 2019-2X A1AN XS2022418672 Contract 24663537 need to get 24663537
From RE: BBVA-MAD MMX_24644644 + MMX_24644645 need to get 24644644, 24644645
Right now I'm using the regexextract function(found it on this web-site), which extracts any number with 8 digits starting with 2. However it would also extract a number from, let's say, this expression TGF00023242526, which is incorrect. Moreover, I don't know how to add additional conditions to the code.
=RegexExtract(A11, ""(2\d{7})\b"", ", ")
Thank you in advance.
Function RegexExtract(ByVal text As String, _
ByVal extract_what As String, _
Optional seperator As String = "") As String
Dim i As Long, j As Long
Dim result As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = extract_what
RE.Global = True
RE.IgnoreCase = True
Set allMatches = RE.Execute(text)
For i = 0 To allMatches.Count - 1
For j = 0 To allMatches.Item(i).SubMatches.Count - 1
result = result & seperator & allMatches.Item(i).SubMatches.Item(j)
Next
Next
If Len(result) <> 0 Then
result = Right(result, Len(result) - Len(seperator))
End If
RegexExtract = result
End Function
You may create a custom boundary using a non-capturing group before the pattern you have:
(?:[\D0]|^)(2\d{7})\b
^^^^^^^^^^^
The (?:[\D0]|^) part matches either a non-digit (\D) or 0 or (|) start of string (^).
As an alternative to also match 8 digits in values like 23,242,526 and start with a digit 1-9 you might use
\b[1-9](?:,?\d){7}\b
\b Word boundary
[1-9] Match the firstdigit 1-9
(?:,?\d){7} Repeat 7 times matching an optional comma and a digit
\b Word boundary
Regex demo
Then you could afterwards replace the comma's with an empty string.

Filter a string using regular expression

I tried the following code. However, the result is not what I want.
$strLine = "100.11 Q9"
$sortString = StringRegExp ($strLine,'([0-9\.]{1,7})', $STR_REGEXPARRAYMATCH)
MsgBox(0, "", $sortString[0],2)
The output shows 100.11, but I want 100.11 9. How could I display it this way using a regular expression?
$sPattern = "([0-9\.]+)\sQ(\d+)"
$strLine = "100.11 Q9"
$sortString = StringRegExpReplace($strLine, $sPattern, '\1 \2')
MsgBox(0, "$sortString", $sortString, 2)
$strLine = "100.11 Q9"
$sortString = StringRegExp($strLine, $sPattern, 3); array of global matches.
For $i1 = 0 To UBound($sortString) -1
MsgBox(0, "$sortString[" & $i1 & "]", $sortString[$i1], 2)
Next
The pattern is to get the 2 groups being 100.11 and 9.
The pattern will 1st match the group with any digit and dot until it reach
/s which will match the space. It will then match the Q. The 2nd group
matches any remaining digits.
StringRegExpReplace replaces the whole string with 1st and 2nd groups
separated with a space.
StringRegExp get the 2 groups as 2 array elements.
Choose 1 from the 2 types regexp above of which you prefer.

Regex to extract value at fixed position index

I have the following string of characters:
73746174652C313A312C310D
|
- extract the value at this position
I would like to extract the value 1 (the 1 at the end of the string) using regex.
So basically a regex that acts as a charAt(index).
I need this solution for a 3rd party application that only supports regular expressions. Note that the application cannot access capture groups and does not support negative lookbehinds.
In C#:
(?<=^.{21})(.)
in JS:
/.(?=.{2}$)/
You could try:
(?<=^.{21}).
It won't work in Javascript, but perhaps it will work in your app.
It means: a single character preceded (?<= ... ) by the beginning of the string ^ plus 21 characters .{21} . So, in the end, it returns the 22th character.
The 22nd character is in capture group 1.
/^.{21}(.)/
But what system are you in that requires this instead of normal string processing?
Depends how you want to match it ( x distance from the beginning or x distance from the end )
/(.).{2}$/ Third from the end (capturing group 1)
/^.{21}(.)/ 22nd character (capturing group 1)
//PHP
$str = '73746174652C313A312C310D';
$char = preg_replace('/(.).{2}$/','$1',$str); //3rd from last
preg_match('/(.).{2}$/',$str,$chars); //3rd from last
$char = $chars[1];
preg_match('/^.{21}(.)/',$str,$chars); //22nd character
$char = $chars[1];
//JS
var str = '73746174652C313A312C310D';
var ch = str.replace(/(.).{2}$/,'$1'); //3rd from last
var ch = str.match(/(.).{2}$/)[1]; //3rd from last
var ch = str.match(/^.{21}(.)/)[1]; //22nd character
If you're having to use the result of the First match: bit of your tool, run it twice:
73746174652C313A312C310D - ^.{21}. = 73746174652C313A312C31
73746174652C313A312C31 - .$ = 1