I have lines of text as follows. I only want the first date after Examination date so that the expected output is 10.08.2017
Examination Date
date: 10.08.2017
423432
tert
g
534534
Examination Date: 04-07-2017
so far I have tried:
Examination Date.*?\d{2}.?{2}?.\d{4}
but I get the entire result to 04-07-2017
Fix the pattern by adding \d before the {2}? and removing unnecessary ?s abd capture the value you need:
String s = "Examination Date \n\ndate: 10.08.2017 \n423432\n\ntert\n\ng\n\n534534\n\nExamination Date: 04-07-2017";
Pattern pattern = Pattern.compile("Examination Date.*?\\b(\\d{2}\\W\\d{2}\\W\\d{4})\\b", Pattern.DOTALL);
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
System.out.println(matcher.group(1)); // => 10.08.2017
}
See the Java demo and the regex demo. In the code, you only get the first match as if is used, not while, and the . matches line breaks thanks to the Pattern.DOTALL modifier.
Details
Examination Date - a literal substring
.*? - any 0+ chars, as few as possible
\\b - a word boundary (if you do not care about matching the date as a "whole" word, remove the \\b)
(\\d{2}\\W\\d{2}\\W\\d{4}) - Group 1:
\\d{2} - 2 digits
\\W - any non-word char (punctuation, space, symbol)
\\d{2}\\W - as above
\\d{4} - 4 digits
\\b - a trailing word boundary.
Related
I wish to match a filename with column and line info, eg.
\path1\path2\a_file.ts:17:9
//what i want to achieve:
match[1]: a_file.ts
match[2]: 17
match[3]: 9
This string can have garbage before and after the pattern, like
(at somewhere: \path1\path2\a_file.ts:17:9 something)
What I have now is this regex, which manages to match column and line, but I got stuck on filename capturing part.. I guess negative lookahead is the way to go, but it seems to match all previous groups and garbage text in the end of string.
(?!.*[\/\\]):(\d+):(\d+)\D*$
Here's a link to current implementation regex101
You can replace the lookahead with a negated character class:
([^\/\\]+):(\d+):(\d+)\D*$
See the regex demo. Details:
([^\/\\]+) - Group 1: one or more chars other than / and \
: - a colon
(\d+) - Group 2: one or more digits
: - a colon
(\d+) - Group 3: one or more digits
\D*$ - zero or more non-digit chars till end of string.
I'm trying to come up with a regex expression to replace an entire string with just the first two values. Examples:
Entire String: AO SMITH 100108283 4500W/240V SCREW-IN ELEMENT, 11"
First Two Values: AO SMITH
Entire String: BRA14X18HEBU / P11-042 / 310-470NL BRASS 1/4 x 1/8 HEX
BUSHING
First Two Values: BRA14X18HEBU / P11-042
Entire String: TWO-HOLE PIPE STRAP 4" 008004EG 72E 4
First Two Values: TWO-HOLE PIPE
The caveat is I'm wanting to preserve any kind of special characters and not count them, like "/"'s and "-"'s. The current code I've written does not, instead leaves the new values entirely blank. Only the first example above works.
Here's what I've got so far:
Matching Value:
^(\w+) +(\w+).+$
New Value:
$1 $2
One option could be using a single capture group and use that in the replacement.
^(\w+(?:-\w+)?(?: +\/)? +\w+(?:-\w+)?).+
The pattern matches:
^ Start of string
( Capture group 1
\w+(?:-\w+)?Match 1+ word charss with an optional part to match a - and 1+ word chars
(?: +\/)? Optionally match /
+\w+(?:-\w+)? Match 1+ word charss with an optional part to match a - and 1+ word chars
) Close group 1
.+ Match 1+ times any char (the rest of the line)
If there can be more than 1 hyphen, you can use * instead of ?
Regex demo
Output
AO SMITH
BRA14X18HEBU / P11-042
TWO-HOLE PIPE
A broader match could be matching non word chars in between the words
^(\w+(?:-\w+)*[\W\r\n]+\w+(?:-\w+)*).+
Regex demo
I'm stumped on the following getting the following regex to work (VB.NET)
Input:
+1.41 DS +0.93 DC x 3* #12.5 mm (4.00 Rx Calc)
Expected Output:
+0.93
I've gotten as far as the following expression:
DS[ \t]*[+\-][ \t]*\d{1,2}\.\d{2}
This returns a result of
DS +0.93
I need to return only +0.93 (without any leading whitespace), when i modify the Regex as:
(?DS[ \t]*)([+\-][ \t]*\d{1,2}\.\d{2})
I get the error unrecognized grouping construct, I don't understand why it's giving me this error. I think my non-matching group is incorrect, but i can't find why/where?
You may use a positive lookbehind here:
(?<=DS[ \t]*)[+-][\t ]*\d{1,2}\.\d{2}
^^^
See the regex demo
To make sure you match the number and DS as whole words (with no letters, digits or _ in front and at the back) use word boundaries:
(?<=\bDS[ \t]*)[+-][\t ]*\d{1,2}\.\d{2}\b
Or a negative lookahead (?!\d) after \d{2}:
(?<=\bDS[ \t]*)[+-][\t ]*\d{1,2}\.\d{2}(?!\d)
See another regex demo.
Details
(?<=\bDS[ \t]*) - a positive lookbehind that matches a location in string that is immediately preceded with DS as a whole word followed with 0+ spaces or tabs
[+-] - a + or -
[\t ]* - 0+ spaces or tabs
\d{1,2} - 1 or 2 digits
\. - a dot
\d{2} - 2 digits
(?!\d) - no digit allowed immediately to the right of the current location.
VB.NET demo:
Dim my_rx As Regex = New Regex("(?<=\bDS[ \t]*)[+-][\t ]*\d{1,2}\.\d{2}(?!\d)")
Dim my_result As Match = my_rx.Match(" +1.41 DS +0.93 DC x 3* #12.5 mm (4.00 Rx Calc)")
If my_result.Success Then
Console.WriteLine(my_result.Value) ' => +0.93
End If
Using EcmaScript 6 RegExp
From this : "-=Section A=- text A -=Section B=- text b"
I want to get this: ['Section A', 'text A', 'Section B', 'text B']
Apart from the delimiters, everything else is variable. (Eventually '-=someString=-' will be '' but for now I did not want to clutter things up or create errors with characters that need escaping.)
I am not a regex expert, but I have searched all day for an example or guidance to make this work without success.
For example using this code:
let templateString = "-=Section A=- text A -=Section B=- text b";
let regex = RegExp('-=(.*?)=-(.*?)','g');
I only get this: ["-=Section A=-", "Section A", ""]
I am not sure how to make the second of the captures capture 'text A'. Also I do not understand why the g modifier is not making it continue after the first match and go on to find 'Section B' and 'text B'.
Any pointers to some examples would be appreciated - I have failed to find any.
Note that (.*?) at the end of the pattern will always match an empty string since it is lazy, and is not executed in the first place. text A cannot be matched because the matches ends with =-, since .*? does not have to be matched.
You may use
let templateString = "-=Section A=- text A -=Section B=- text b";
let regex = /\s*-=(.*?)=-\s*/;
console.log(templateString.split(regex).filter(Boolean));
The \s*-=(.*?)=-\s* pattern finds
\s* - 0+ whitespaces
-= - a -= substring
(.*?) - Group 1: any 0+ chars, as few as possible up to the first occurrence of the subsequent subpatterns
=- - a =- substring
\s* - 0+ whitespaces.
The String#split method adds to the resulting array all substrings captured into Group 1.
If you want to use a matching approach, you would need to match any char, 0 or more occurrences, that does not start the leading char sequence, which seems to be -= in your scenario:
let templateString = "-=Section A=- text A -=Section B=- text b";
let regex = /-=(.*?)=-\s*([^-]*(?:-(?!=)[^-]*)*)/g;
let m, res=[];
while (m=regex.exec(templateString)) {
res.push([m[1], m[2].trim()]);
}
console.log(res);
See this regex demo
Details
-=(.*?)=-\s* - same as in the first regex (see the split regex above)
([^-]*(?:-(?!=)[^-]*)*) - Group 2 that matches and captures:
[^-]* - 0+ chars other than -
(?: - start of a non-capturing group that matches
-(?!=) - a hyphen that is not immediately followed with =
[^-]* - 0+ chars other than -
)* - ...zero or more times
If I have the following example:
X-FileName: pallen (Non-Privileged).pst
Here is our forecast
Message-ID: <15464986.1075855378456.JavaMail.evans#thyme>
How can I select the text
Here is our forecast
after "X-FileName .... \n" until "Message-ID" execluded?
I read about lookahead and behind and tried this but didn't work:
(?<=X-FileName:(\n)+$).+(?=Message-ID:)
This should do it:
(?:X-FileName:[^\n]+)\n+([^\n]+)\n+(?:Message-ID:) (group #1 is the match)
Demo
Explanation:
(?:X-FileName:[^\n]+) matches X-Filename: followed by any number of characters that aren't newlines, without capturing it (?:).
\n+ matches any number of consecutive newlines.
([^\n]+) matches and captures any number of consecutive characters that aren't newlines.
\n+, again, matches any number of consecutive newlines.
(?:Message-ID:) matches Message-ID: without capturing it (?:).
Edit: as #WiktorStribiżew mentioned though, splitting your text into lines may be an easier/cleaner way to retrieve what you want.
There are two approaches here, and they depend on the broader context. If your expected substring is the second paragraph, just split with \n\n (or \r\n\r\n) and get the second item from the resulting list.
If it is a text inside some larger text, use a regex.
See a Python demo:
import re
s='''X-FileName: pallen (Non-Privileged).pst
Here is our forecast
Message-ID: <15464986.1075855378456.JavaMail.evans#thyme>'''
# Non-regex way for the string in the exact same format
print(s.split('\n\n')[1])
# Regex way to get some substring in a known context
m = re.search(r'X-FileName:.*[\r\n]+(.+)', s)
if m:
print(m.group(1))
The regex means:
X-FileName: - a literal substring
.* - any 0+ chars other than line break chars
[\r\n]+ - 1 or more CR or LF chars
(.+) - Group 1: one or more chars other than line break chars, as many as possible.
See the regex demo.