Groovy null regex - regex

I would like to do the same task as this question but with groovy.
REGEX: How to split string with space and double quote
def sourceString = "18 17 16 \"Arc 10 12 11 13\" \"Segment 10 23 33 32 12\" 23 76 21"
def myMatches = sourceString.findAll(/("[^"]+")|\S+/) { match, item -> item }
println myMatches
This is the result
[null, null, null, "Arc 10 12 11 13", "Segment 10 23 33 32 12", null, null, null]

Consider the following, which uses the Elvis operator:
def sourceString = '18 17 16 "Arc 10 12 11 13" "Segment 10 23 33 32 12" 23 76 21'
def regex = /"([^"]+)"|\S+/
def myMatches = sourceString.findAll(regex) { match, item ->
item ?: match
}
assert 8 == myMatches.size()
assert 18 == myMatches[0] as int
assert 17 == myMatches[1] as int
assert 16 == myMatches[2] as int
assert "Arc 10 12 11 13" == myMatches[3]
assert "Segment 10 23 33 32 12" == myMatches[4]
assert 23 == myMatches[5] as int
assert 76 == myMatches[6] as int
assert 21 == myMatches[7] as int

Returning the match instead of item gives nearly the expected result but the quotes remain. Don't know how to exclude them using regexp but removing the quotes from the result works:
def myMatches = sourceString.findAll(/"([^"]+)"|\S+/) { match, item -> match.replace('"', '') }

Related

Golang regex : Ignore multiple occurrences

I've got a simple need.
Giving this input (string) : 10 20 30 40 65 45 44 67 100 200 65 40 66 88 65
I need to get all numbers between 65 and 66.
Problem is when we have multiple occurrence of each limit.
With a regex like : (65).+(66), I captured 65 45 44 67 100 200 65 40 66. But I would like to get only 40.
How could I achieve this ?
https://regex101.com/r/9HoKxr/1
Sounds like you want to exclude matching '65' inside the number of your pattern upto the 1st occurence of '66'? It's a bit verbose but what about:
\b65((?:\s(?:\d|[1-57-9]\d|6[0-47-9]|\d{3,}))+?)\s66\b
See an online demo
\b65\s - Start with '65' between a word-boundary and a whitespace char;
( - Open capture group;
(?:\s - Non-capture group with the constant of a whitespace char;
(?:\d|[1-57-9]\d|6[0-46-9]|\d{3,}) - Nested non-capture group to match any integer but '65' or '66';
)+?) - Close non-capture group and match it at least once but as few times as possible. Then close the capture group;
\s66\b - Match another space followed by '66' and word-boundary.
Note:
We will handle leading spaces with the Trim() function through the strings package;
That in my examples I have used '10 20 30 40 65 45 44 40 66 200 65 40 66 88 65' which should return multiple matches. In such case it's established OP is looking for the 'shortest' matching substring;
By 'shortest' it's meant that we are looking for the least amount of elements when the substring is split with spaces (using 'Fields' function from above mentione strings package). Therefor '123456' is prefered above '1 2 3' despite being the 'longer' substring in terms of characters;
Try:
package main
import (
"fmt"
"regexp"
"strings"
)
func main() {
s := `10 20 30 40 65 45 44 40 66 200 65 40 66 88 65`
re := regexp.MustCompile(`\b65((?:\s(?:\d|[1-57-9]\d|6[0-47-9]|\d{3,}))+?)\s66\b`)
matches := re.FindAllStringSubmatch(s, -1) // Retrieve all matches
shortest := ``
for i, _ := range matches { // Loop over array
if shortest == `` || len(strings.Fields(matches[i][1])) < len(strings.Fields(shortest)) {
shortest = strings.Trim(matches[i][1], ` `)
}
}
fmt.Println(shortest)
}
Try it for yourself here.

Extract data using regex Dart/Flutter

I want to extract data from the following
Periode Aantal uur Sv-loon
01-06-2019 t/m 30-06-2019 35 € 800,00
01-05-2019 t/m 31-05-2019 35 € 1.056,00
01-04-2019 t/m 30-04-2019 35 € 800,00
01-03-2019 t/m 31-03-2019 35 € 800,00
01-02-2019 t/m 28-02-2019 35 € 800,00
Datum: 06 augustus 2019
The expected output is :
01-06-2019 t/m 30-06-2019 35 € 800,00
01-05-2019 t/m 31-05-2019 35 € 1.056,00
01-04-2019 t/m 30-04-2019 35 € 800,00
01-03-2019 t/m 31-03-2019 35 € 800,00
01-02-2019 t/m 28-02-2019 35 € 800,00
Check what I tried so far example
You may use
Sv-loon\s*([\s\S]*?)\s*Datum:
See the regex demo. Details:
Sv-loon - a literal string
\s* - 0 or more whitespaces
([\s\S]*?) - Group 1: any 0 or more chars as few as possible
\s* - 0 or more whitespaces
Datum: - a literal string
See Dart demo:
String txt = "Periode Aantal uur Sv-loon\n01-06-2019 t/m 30-06-2019 35 € 800,00\n01-05-2019 t/m 31-05-2019 35 € 1.056,00\n01-04-2019 t/m 30-04-2019 35 € 800,00\n01-03-2019 t/m 31-03-2019 35 € 800,00\n01-02-2019 t/m 28-02-2019 35 € 800,00\nDatum: 06 augustus 2019";
RegExp rx = RegExp(r'Sv-loon\s*([\s\S]*?)\s*Datum:');
Match match = rx.firstMatch(txt);
if (match != null) {
print(match.group(1));
}
Output
01-06-2019 t/m 30-06-2019 35 € 800,00
01-05-2019 t/m 31-05-2019 35 € 1.056,00
01-04-2019 t/m 30-04-2019 35 € 800,00
01-03-2019 t/m 31-03-2019 35 € 800,00
01-02-2019 t/m 28-02-2019 35 € 800,00
Extract Date only:
void main() {
String inputString = "Your String 1/19/2023 9:29:11 AM";
RegExp dateRegex = new RegExp(r"(\d{1,2}\/\d{1,2}\/\d{4})");
Iterable<RegExpMatch> matches = dateRegex.allMatches(inputString);
for (RegExpMatch m in matches) {
print(m.group(0));
}
}
This will output:
1/19/2023
Extract Date and time:
void main() {
String inputString = "Your String 1/19/2023 9:29:11 AM";
RegExp dateTimeRegex = new RegExp(r"(\d{1,2}\/\d{1,2}\/\d{4} \d{1,2}:\d{2}:\d{2} [AP]M)");
Iterable<RegExpMatch> matches = dateTimeRegex.allMatches(inputString);
for (RegExpMatch m in matches) {
print(m.group(0));
}
}
This will output: 1/19/2023 9:29:11 AM
RegExp re = new RegExp("((?<=Sv-loon)([\\S\\s]*?)(?=Datum:))");
Where ?= is a Positive Lookahead.

dart regex remove space phone

I tried all this regex solution but no match REGEX Remove Space
I work with dart and flutter and I tried to capture only digit of this type of string :
case 1
aaaaaaaaa 06 12 34 56 78 aaaaaa
case 2
aaaaaaaa 0612345678 aaaaaa
case 3
aaaaaa +336 12 34 56 78 aaaaa
I search to have only 0612345678 with no space and no +33. Just 10 digit in se case of +33 I need to replace +33 by 0
currently I have this code \D*(\d+)\D*? who run with the case 2
You may match and capture an optional +33 and then a digit followed with spaces or digits, and then check if Group 1 matched and then build the result accordingly.
Here is an example solution (tested):
var strs = ['aaaaaaaaa 06 12 34 56 78 aaaaaa', 'aaaaaaaa 0612345678 aaaaaa', 'aaaaaa +336 12 34 56 78 aaaaa', 'more +33 6 12 34 56 78'];
for (int i = 0; i < strs.length; i++) {
var rx = new RegExp(r"(?:^|\D)(\+33)?\s*(\d[\d ]*)(?!\d)");
var match = rx.firstMatch(strs[i]);
var result = "";
if (match != null) {
if (match.group(1) != null) {
result = "0" + match.group(2).replaceAll(" ", "");
} else {
result = match.group(2).replaceAll(" ", "");
}
print(result);
}
}
Returns 3 0612345678 strings in the output.
The pattern is
(?:^|\D)(\+33)?\s*(\d[\d ]*)(?!\d)
See its demo here.
(?:^|\D) - start of string or any char other than a digit
(\+33)? - Group 1 that captures +33 1 or 0 times
\s* - any 0+ whitespaces
(\d[\d ]*) - Group 2: a digit followed with spaces or/and digits
(?!\d) - no digit immediately to the right is allowed.
Spaces are removed from Group 2 with a match.group(2).replaceAll(" ", "") since one can't match discontinuous strings within one match operation.

vba: regex, remove all but cell reference row references from formula strings

In VBA I am trying to build a generalized function that turn strings like these:
a) =IFERROR(PERCENTRANK($FU$23:$FU$2515,FU24,3)*100,FY$17)
b) =IF(FZ$16=(BDP($C24,FZ$18,FZ$19,"EQY_FUND_CRNCY",FX)),FZ$17,IF($B24="","",BDP($C24,FZ$18,FZ$19,"EQY_FUND_CRNCY",FX)))
c) =IF(ISNUMBER(FU24),TRUNC((((COUNTIF($J$23:$J$2515,$J24)-(SUMPRODUCT(($J$23:$J$2515=$J24)*(FU24<FU$23:FU$2515))))/COUNTIF($J$23:$J$2515,$J24)))*100,2),FX$17)
d) =IFERROR(PERCENTRANK(EO$23:EO$2515,EO24,3)*(-100)+100,ET$17)
e) =BDP($C24,EH$18,EH$19,"EQY_FUND_CRNCY",FX)
Into these:
a) 23 2515 24 17
b) 16 24 18 19 17 24 24 18 19
c) 24 23 2515 24 23 2515 24 24 23 2515 23 2515 24 17
d) 23 2515 24 17
e) 24 18 19
In other words, remove everything except cell reference rows and separate them with spaces (or some other deliminator) so I can VBA.split(x," ") them later.
Notes:
Numbers that aren't part of a cell references are removed.
To use this function you must have the regular expression library. If the code below doesn't work for you include the library: http://support.microsoft.com/kb/818802. (Side note: if you know how to include the library in the code without having to follow those instructions, please share.)
The list of formulas in this example is just an example. I am looking for a generalized solution.
I built this little test sub that might be helpful (IT DOESN'T DO WHAT I WANT):
Sub test()
Dim s As String
s = "=IFERROR(PERCENTRANK($FU$23:$FU$2515,FU24,3)*100,FY$17)"
Dim s2 As String
Dim s3 As String
Dim s1 As String
Static re As RegExp
If re Is Nothing Then Set re = New RegExp
re.IgnoreCase = True
re.Global = True
re.Pattern = "[$]"
s1 = re.Replace(s, "")
re.Pattern = "[^A-Z0-9 ]"
s2 = re.Replace(s1, " ")
re.Pattern = "[^0-9]"
s3 = re.Replace(s2, " ")
Debug.Print s3
End Sub
Try:
Sub test()
Dim s As String, matches, m
s = "=IFERROR(PERCENTRANK($FU$23:$FU$2515,FU24,3)*100,FY$17)"
Static re As Object
If re Is Nothing Then
Set re = CreateObject("VBScript.RegExp") 'late binding
re.IgnoreCase = True
re.Global = True
re.Pattern = "[A-Z]+\$?(\d+)"
End If
Set matches = re.Execute(s)
If matches.Count > 0 Then
For Each m In matches
Debug.Print m.SubMatches(0)
Next m
End If
End Sub

Regex to detect ASCII art on a single line.

Basically I want to find ASCII Art on one line. For me this is any 2 characters that are not alpha numeric ignoring whitespace. So a line might look like :
This is a !# Test of --> ASCII art detection ### <--
So the matches I should get are :
!#
-->
###
<--
I came up with this which still selects spaces :(
\b\W{2,}
Im using the following website for testing :
http://gskinner.com/RegExr/
Thanks for the help its much appreciated!!
I'd suggest something like this:
[^\w\s]{2,}
This will match any sequence of two or more characters that are not word characters (which include alphanumeric characters and underscores) or whitespace characters.
Demonstration
If you would also like to match underscores as part of your 'ASCII art', you'd have to be more specific:
[^a-zA-Z0-9\s]{2,}
Demonstration
I think this
((?=[\x21-\x7e])[\W_]){2,}
is probably equavalent to this
[[:punct:]]{2,}
Using POSIX, the supported punctuation is:
(to add more, just add it to the class [[:punct:]<add here>]{2,}
33 = !
34 = "
35 = #
36 = $
37 = %
38 = &
39 = '
40 = (
41 = )
42 = *
43 = +
44 = ,
45 = -
46 = .
47 = /
58 = :
59 = ;
60 = <
61 = =
62 = >
63 = ?
64 = #
91 = [
92 = \
93 = ]
94 = ^
95 = _
96 = `
123 = {
124 = |
125 = }
126 = ~