Regex to find a date BEFORE a word - regex

I am trying to write a RegEx statement to locate the first date BEFORE a specific word.
I've used the below Regex to show the first date AFTER a specific word.
Word
+\K(?:([0-9]+)/([0-9]+)/([0-9]+)|((0?[1-9]|1[0-2])-(0?[1-9]|[12]\d|3[01])-(\d{4}|\d{2}))|\w+\s\d{2},\s\d{4}|(?i)\b(Jan(?:uary|.)?|Feb(?:ruary|.)?|Mar(?:ch|.)?|Apr(?:il|.)?|May|Jun(?:e|.)?|Jul(?:y|.)?|Aug(?:ust|.)?|Sep(?:tember|.)?|Oct(?:ober|.)?|Nov(?:ember|.)?|Dec(?:ember|.)?)(
,?[
]|-(?:0?[1-9]|[1-2][0-9]|3[01])-)(\d{4}))
Here is an example of what I want it to return.
Many words here 01/07/2019 02/03/2019 02/08/2019 More words here. In this case it should return the date 02/08/2019. How can I change the above statement to locate a date BEFORE a specified word?
I use Notepad ++ to test if that helps determine what type of RegEx I use.
Bonus question: sometimes the word to match on may be on a new line. Can regex still match on that? For example it may be formatted as shown below where the word "More" is on a new line:
Many words here
01/07/2019
02/03/2019
02/08/2019
More words here

You could use a positive lookahead (?=\h+More\b) at the end of your date like pattern to assert what follows is 1+ times a horizontal whitespace char followed by Word and a word boundary.
(?:([0-9]+)/([0-9]+)/([0-9]+)|((0?[1-9]|1[0-2])-(0?[1-9]|[12]\d|3[01])-(\d{4}|\d{2}))|\w+\s\d{2},\s\d{4}|(?i)\b(Jan(?:uary|.)?|Feb(?:ruary|.)?|Mar(?:ch|.)?|Apr(?:il|.)?|May|Jun(?:e|.)?|Jul(?:y|.)?|Aug(?:ust|.)?|Sep(?:tember|.)?|Oct(?:ober|.)?|Nov(?:ember|.)?|Dec(?:ember|.)?)( ,?[ ]|-(?:0?[1-9]|[1-2][0-9]|3[01])-)(\d{4}))(?=\h+More\b)
Regex demo
If the word can be on a newline you could change \h to \s
Regex demo

Related

How to write regex expression to match words that start with special characters and end with a group of words

I am working on the following regex problem that I have almost solved.
The goal is to find the words that start either with special characters or a space and end with one of these words including the period .qvd .txt .xlsx
For example
"list.xlsx random %ford.txt #catch.qvd cars roads"
From above string I need to extract the following
list.xlsx , ford.txt and catch.qvd
[#%\S]\w+\.+txt
My solution only checks the words that end with .txt. How can I change my regex expression to include .qvd , and .xlsx too
In this pattern [#%\S]\w+\.+txt the \S also matches $ and % and is the same as \S\w+\.+txt.
That would require a string that starts with a non whitespace char and will include "special chars" in the match, and the string must be at least 2 characters long.
If there can be either a "special char" or a space or the start of the string to the left, you can start the match directly with word characters, followed by matching any of the alternatives using a non capture group (?:txt|qvd|xlsx) and a word boundary \b at the end to prevent a partial word match.
\w+\.(?:txt|qvd|xlsx)\b
Regex demo
Use the | (alternation operator/metacharacter) to express an "or" relation between two or more subexpressions:
(?<!\S)[#%]\S+\.(?:txt|qvd|xslx)

Regex - Matching Strings with a Single Character

I'm fairly new to regex. I'm looking for an expression which will return results which meet the following criteria:
The First word must be 3 letters or more
The last word must be 3 characters or more
If any word or words in-between the first and last word contains ONLY 1 letter, then return that phrase
Every other word in-between the first and last character that (apart from the single letter words) must be 3 letters or more
I would like it to return phrases like:
'Therefore a hurricane shall arrive' and 'However I know I like Michael Smith'
There should be a space between each word.
So far I have:
^([A-Za-z]{3,})*$( [A-Za-z])*$( [A-Za-z]{3,})*$
Any help would be appreciated. Is it something to do with the spacing? I'm using an application called 'Oracle EDQ'.
In a normal regex world you'd use a \b, a word boundary.
^[a-zA-Z]{3,}(\s+|\b([a-zA-Z]|[a-zA-Z]{3,})\b)*\s+[a-zA-Z]{3,}$
^^ ^^
See demo
And perhaps, non-capturing groups (as anubhava shows).
From what I see, there are no word boundaries in Oracle EDQ regex syntax (as well as non-capturing groups). You should rely on the \s pattern, matching whitespace.
So, make it obligatory, either with
^[a-zA-Z]{3,}(\s+|\s([a-zA-Z]|[a-zA-Z]{3,}))*\s+[a-zA-Z]{3,}$
^^
OR
^[a-zA-Z]{3,}(\s+|([a-zA-Z]|[a-zA-Z]{3,})\s)*\s*[a-zA-Z]{3,}$
^^ ^
You can use this regex:
^[a-zA-Z]{3,}(?:\s+|(?:[a-zA-Z]|[a-zA-Z]{3,}))*\s+[a-zA-Z]{3,}$
RegEx Demo

Find all strings not preceeded by another, with anything in between in notepad++

I know using a simple negative lookbehind
#(?<!first word)\r\nsecond word#s
This will not find second word in
some text
first word
second word
some text
and matches as expected in
some text
second word
some text
It also matches here, but it should not
some text
first word
any other text
second word
some text
How do I need to modify my regular expression to meet the requirements ?
I tried #(?<!first word).*second word#s, but it always matches.
I need this to search through many files in notepad++
Your first regexp is matching 3rd example as if it is looking a string that is not first word and which has a second word as a next string.
The last regexp would match everything because of .* which is matching everything.
I'm suggesting to add a .* in negative lookbehind.
I don't know which editor you are using, so please correct if it's not corresponding to your's regexp syntax.
I would search a maximal long string which has not first word to be proceeded by second word like this
^(?!.*first word.*)\r\nsecond word
I hope it will work.
Good luck!

Regex to match first word in sentence

I am looking for a regex that matches first word in a sentence excluding punctuation and white space. For example: "This" in "This is a sentence." and "First" in "First, I would like to say \"Hello!\""
This doesn't work:
"""([A-Z].*?(?=^[A-Za-z]))""".r
(?:^|(?:[.!?]\s))(\w+)
Will match the first word in every sentence.
http://rubular.com/r/rJtPbvUEwx
This is an old thread but people might need this like I did.
None of the above works if your sentence starts with one or more spaces.
I did this to get the first (non empty) word in the sentence :
(?<=^[\s"']*)(\w+)
Explanation:
(?<=^[\s"']*) positive lookbehind in order to look for the start of the string, followed by zero or more spaces or punctuation characters (you can add more between the brackets), but do not include it in the match.
(\w+) the actual match of the word, which will be returned
The following words in the sentence are not matched as they do not satisfy the lookbehind.
You can use this regex: ^[^\s]+ or ^[^ ]+.
You can use this regex: ^\s*([a-zA-Z0-9]+).
The first word can be found at a captured group.
[a-z]+
This should be enough as it will get the first a-z characters (assuming case-insensitive).
In case it doesn't work, you could try [a-z]+\b, or even ^[a-z]\b, but the last one assumes that the string starts with the word.

How to match whole word that is preceded by a tab?

I am trying to get the first word in the line that matches the whole word 'number'. But I am only interested where whole word 'number' is matched and is preceded by a tab.
For example if following is the text:
tin identification number 4/10/2007 LB
num number 9/27/2006 PAT
I want to get back num
Regex I have is:
match whole word: \bnumber\b
if above is found then get first word: ([^\s]*)
I think I need modification in match whole word regex so that it only matches when whole word is preceded by a tab
This answer depends a bit on your regex engine as they can have different representations for tab. In the .Net regex engine though it would look like ...
\tnumber
try lookahead:
([^\s]+)(?=.*\tnumber)
(?:(\t([^\t ]*)))