Extract data using regex Dart/Flutter - regex

I want to extract data from the following
Periode Aantal uur Sv-loon
01-06-2019 t/m 30-06-2019 35 € 800,00
01-05-2019 t/m 31-05-2019 35 € 1.056,00
01-04-2019 t/m 30-04-2019 35 € 800,00
01-03-2019 t/m 31-03-2019 35 € 800,00
01-02-2019 t/m 28-02-2019 35 € 800,00
Datum: 06 augustus 2019
The expected output is :
01-06-2019 t/m 30-06-2019 35 € 800,00
01-05-2019 t/m 31-05-2019 35 € 1.056,00
01-04-2019 t/m 30-04-2019 35 € 800,00
01-03-2019 t/m 31-03-2019 35 € 800,00
01-02-2019 t/m 28-02-2019 35 € 800,00
Check what I tried so far example

You may use
Sv-loon\s*([\s\S]*?)\s*Datum:
See the regex demo. Details:
Sv-loon - a literal string
\s* - 0 or more whitespaces
([\s\S]*?) - Group 1: any 0 or more chars as few as possible
\s* - 0 or more whitespaces
Datum: - a literal string
See Dart demo:
String txt = "Periode Aantal uur Sv-loon\n01-06-2019 t/m 30-06-2019 35 € 800,00\n01-05-2019 t/m 31-05-2019 35 € 1.056,00\n01-04-2019 t/m 30-04-2019 35 € 800,00\n01-03-2019 t/m 31-03-2019 35 € 800,00\n01-02-2019 t/m 28-02-2019 35 € 800,00\nDatum: 06 augustus 2019";
RegExp rx = RegExp(r'Sv-loon\s*([\s\S]*?)\s*Datum:');
Match match = rx.firstMatch(txt);
if (match != null) {
print(match.group(1));
}
Output
01-06-2019 t/m 30-06-2019 35 € 800,00
01-05-2019 t/m 31-05-2019 35 € 1.056,00
01-04-2019 t/m 30-04-2019 35 € 800,00
01-03-2019 t/m 31-03-2019 35 € 800,00
01-02-2019 t/m 28-02-2019 35 € 800,00

Extract Date only:
void main() {
String inputString = "Your String 1/19/2023 9:29:11 AM";
RegExp dateRegex = new RegExp(r"(\d{1,2}\/\d{1,2}\/\d{4})");
Iterable<RegExpMatch> matches = dateRegex.allMatches(inputString);
for (RegExpMatch m in matches) {
print(m.group(0));
}
}
This will output:
1/19/2023
Extract Date and time:
void main() {
String inputString = "Your String 1/19/2023 9:29:11 AM";
RegExp dateTimeRegex = new RegExp(r"(\d{1,2}\/\d{1,2}\/\d{4} \d{1,2}:\d{2}:\d{2} [AP]M)");
Iterable<RegExpMatch> matches = dateTimeRegex.allMatches(inputString);
for (RegExpMatch m in matches) {
print(m.group(0));
}
}
This will output: 1/19/2023 9:29:11 AM

RegExp re = new RegExp("((?<=Sv-loon)([\\S\\s]*?)(?=Datum:))");
Where ?= is a Positive Lookahead.

Related

Golang regex : Ignore multiple occurrences

I've got a simple need.
Giving this input (string) : 10 20 30 40 65 45 44 67 100 200 65 40 66 88 65
I need to get all numbers between 65 and 66.
Problem is when we have multiple occurrence of each limit.
With a regex like : (65).+(66), I captured 65 45 44 67 100 200 65 40 66. But I would like to get only 40.
How could I achieve this ?
https://regex101.com/r/9HoKxr/1
Sounds like you want to exclude matching '65' inside the number of your pattern upto the 1st occurence of '66'? It's a bit verbose but what about:
\b65((?:\s(?:\d|[1-57-9]\d|6[0-47-9]|\d{3,}))+?)\s66\b
See an online demo
\b65\s - Start with '65' between a word-boundary and a whitespace char;
( - Open capture group;
(?:\s - Non-capture group with the constant of a whitespace char;
(?:\d|[1-57-9]\d|6[0-46-9]|\d{3,}) - Nested non-capture group to match any integer but '65' or '66';
)+?) - Close non-capture group and match it at least once but as few times as possible. Then close the capture group;
\s66\b - Match another space followed by '66' and word-boundary.
Note:
We will handle leading spaces with the Trim() function through the strings package;
That in my examples I have used '10 20 30 40 65 45 44 40 66 200 65 40 66 88 65' which should return multiple matches. In such case it's established OP is looking for the 'shortest' matching substring;
By 'shortest' it's meant that we are looking for the least amount of elements when the substring is split with spaces (using 'Fields' function from above mentione strings package). Therefor '123456' is prefered above '1 2 3' despite being the 'longer' substring in terms of characters;
Try:
package main
import (
"fmt"
"regexp"
"strings"
)
func main() {
s := `10 20 30 40 65 45 44 40 66 200 65 40 66 88 65`
re := regexp.MustCompile(`\b65((?:\s(?:\d|[1-57-9]\d|6[0-47-9]|\d{3,}))+?)\s66\b`)
matches := re.FindAllStringSubmatch(s, -1) // Retrieve all matches
shortest := ``
for i, _ := range matches { // Loop over array
if shortest == `` || len(strings.Fields(matches[i][1])) < len(strings.Fields(shortest)) {
shortest = strings.Trim(matches[i][1], ` `)
}
}
fmt.Println(shortest)
}
Try it for yourself here.

Regex to get total price with space as separator

I need to build a regex that would catch the total price, here some exemple:
Total: 145.01 $
Total: 1 145.01 $
Total: 00.01 $
Total: 12 345.01 $
It's need to get any price that follow 'Total: ', without the '$'.
That what I got so far : (?<=\bTotal:\s*)(\d+.\d+)
RegExr
I assume:
each string must begin 'Total: ' (three spaces), the prefix;
the last digit in the string must be followed by ' $' (one space), the suffix, which is at the end of the string;
the substring between the prefix and suffix must end '.dd', where 'd' presents any digit, the cents;
the substring between the prefix and cents must match one of the following patterns, where 'd' represents any digit: 'd', 'dd', 'ddd', 'd ddd', 'dd ddd', 'ddd ddd', 'd ddd ddd', 'dd ddd ddd', 'ddd ddd ddd', 'd ddd ddd ddd' and so on;
the return value is the substring between the prefix and suffix that meets the above requirements; and
spaces will be removed from the substring returned as a separate step at the end.
We can use the following regular expression.
r = /\ATotal: {3}(\d{1,3}(?: \d{3})*\.\d{2}) \$\z/
In Ruby (but if you don't know Ruby you'll get the idea):
arr = <<~_.split(/\n/)
Total: 145.01 $
Total: 1 145.01 $
Total: 00.01 $
Total: 12 345.01 $
Total: 1 241 345.01 $
Total: 1.00 $
Total: 1.00$
Total: 1.00 $x
My Total: 1.00 $
Total: 12 34.01 $
_
The following matches each string in the array arr and extracts the contents of capture group 1, which is shown on the right side of each line.
arr.each do |s|
puts "\"#{(s + '"[r,1]').ljust(30)}: #{s[r,1] || 'no match'}"
end
"Total: 145.01 $"[r,1] : 145.01
"Total: 1 145.01 $"[r,1] : 1 145.01
"Total: 00.01 $"[r,1] : 00.01
"Total: 12 345.01 $"[r,1] : 12 345.01
"Total: 1 241 345.01 $"[r,1] : 1 241 345.01
"Total: 1.00 $"[r,1] : no match
"Total: 1.00$"[r,1] : no match
"Total: 1.00 $x"[r,1] : no match
"My Total: 1.00 $"[r,1] : no match
"Total: 12 34.01 $"[r,1] : no match
The regular expression can be written in free-spacing mode to make it self-documenting.
r = /
\A # match the beginning of the string
Total:\ {3} # match 'Total:' followed by 3 digits
( # begin capture group 1
\d{1,3} # match 1, 2 or 3 digits
(?:\ \d{3}) # match a space followed by 3 digits
* # perform the previous match zero or more times
\.\d{2} # match a period followed by 2 digits
) # end capture group 1
\ \$ # match a space followed by a dollar sign
\z # match end of string
/x # free-spacing regex definition mode
The regex can be seen in action here.

dart regex remove space phone

I tried all this regex solution but no match REGEX Remove Space
I work with dart and flutter and I tried to capture only digit of this type of string :
case 1
aaaaaaaaa 06 12 34 56 78 aaaaaa
case 2
aaaaaaaa 0612345678 aaaaaa
case 3
aaaaaa +336 12 34 56 78 aaaaa
I search to have only 0612345678 with no space and no +33. Just 10 digit in se case of +33 I need to replace +33 by 0
currently I have this code \D*(\d+)\D*? who run with the case 2
You may match and capture an optional +33 and then a digit followed with spaces or digits, and then check if Group 1 matched and then build the result accordingly.
Here is an example solution (tested):
var strs = ['aaaaaaaaa 06 12 34 56 78 aaaaaa', 'aaaaaaaa 0612345678 aaaaaa', 'aaaaaa +336 12 34 56 78 aaaaa', 'more +33 6 12 34 56 78'];
for (int i = 0; i < strs.length; i++) {
var rx = new RegExp(r"(?:^|\D)(\+33)?\s*(\d[\d ]*)(?!\d)");
var match = rx.firstMatch(strs[i]);
var result = "";
if (match != null) {
if (match.group(1) != null) {
result = "0" + match.group(2).replaceAll(" ", "");
} else {
result = match.group(2).replaceAll(" ", "");
}
print(result);
}
}
Returns 3 0612345678 strings in the output.
The pattern is
(?:^|\D)(\+33)?\s*(\d[\d ]*)(?!\d)
See its demo here.
(?:^|\D) - start of string or any char other than a digit
(\+33)? - Group 1 that captures +33 1 or 0 times
\s* - any 0+ whitespaces
(\d[\d ]*) - Group 2: a digit followed with spaces or/and digits
(?!\d) - no digit immediately to the right is allowed.
Spaces are removed from Group 2 with a match.group(2).replaceAll(" ", "") since one can't match discontinuous strings within one match operation.

Groovy null regex

I would like to do the same task as this question but with groovy.
REGEX: How to split string with space and double quote
def sourceString = "18 17 16 \"Arc 10 12 11 13\" \"Segment 10 23 33 32 12\" 23 76 21"
def myMatches = sourceString.findAll(/("[^"]+")|\S+/) { match, item -> item }
println myMatches
This is the result
[null, null, null, "Arc 10 12 11 13", "Segment 10 23 33 32 12", null, null, null]
Consider the following, which uses the Elvis operator:
def sourceString = '18 17 16 "Arc 10 12 11 13" "Segment 10 23 33 32 12" 23 76 21'
def regex = /"([^"]+)"|\S+/
def myMatches = sourceString.findAll(regex) { match, item ->
item ?: match
}
assert 8 == myMatches.size()
assert 18 == myMatches[0] as int
assert 17 == myMatches[1] as int
assert 16 == myMatches[2] as int
assert "Arc 10 12 11 13" == myMatches[3]
assert "Segment 10 23 33 32 12" == myMatches[4]
assert 23 == myMatches[5] as int
assert 76 == myMatches[6] as int
assert 21 == myMatches[7] as int
Returning the match instead of item gives nearly the expected result but the quotes remain. Don't know how to exclude them using regexp but removing the quotes from the result works:
def myMatches = sourceString.findAll(/"([^"]+)"|\S+/) { match, item -> match.replace('"', '') }

Regex to detect ASCII art on a single line.

Basically I want to find ASCII Art on one line. For me this is any 2 characters that are not alpha numeric ignoring whitespace. So a line might look like :
This is a !# Test of --> ASCII art detection ### <--
So the matches I should get are :
!#
-->
###
<--
I came up with this which still selects spaces :(
\b\W{2,}
Im using the following website for testing :
http://gskinner.com/RegExr/
Thanks for the help its much appreciated!!
I'd suggest something like this:
[^\w\s]{2,}
This will match any sequence of two or more characters that are not word characters (which include alphanumeric characters and underscores) or whitespace characters.
Demonstration
If you would also like to match underscores as part of your 'ASCII art', you'd have to be more specific:
[^a-zA-Z0-9\s]{2,}
Demonstration
I think this
((?=[\x21-\x7e])[\W_]){2,}
is probably equavalent to this
[[:punct:]]{2,}
Using POSIX, the supported punctuation is:
(to add more, just add it to the class [[:punct:]<add here>]{2,}
33 = !
34 = "
35 = #
36 = $
37 = %
38 = &
39 = '
40 = (
41 = )
42 = *
43 = +
44 = ,
45 = -
46 = .
47 = /
58 = :
59 = ;
60 = <
61 = =
62 = >
63 = ?
64 = #
91 = [
92 = \
93 = ]
94 = ^
95 = _
96 = `
123 = {
124 = |
125 = }
126 = ~