I am trying to get the first word in the line that matches the whole word 'number'. But I am only interested where whole word 'number' is matched and is preceded by a tab.
For example if following is the text:
tin identification number 4/10/2007 LB
num number 9/27/2006 PAT
I want to get back num
Regex I have is:
match whole word: \bnumber\b
if above is found then get first word: ([^\s]*)
I think I need modification in match whole word regex so that it only matches when whole word is preceded by a tab
This answer depends a bit on your regex engine as they can have different representations for tab. In the .Net regex engine though it would look like ...
\tnumber
try lookahead:
([^\s]+)(?=.*\tnumber)
(?:(\t([^\t ]*)))
Related
I am trying to write a RegEx statement to locate the first date BEFORE a specific word.
I've used the below Regex to show the first date AFTER a specific word.
Word
+\K(?:([0-9]+)/([0-9]+)/([0-9]+)|((0?[1-9]|1[0-2])-(0?[1-9]|[12]\d|3[01])-(\d{4}|\d{2}))|\w+\s\d{2},\s\d{4}|(?i)\b(Jan(?:uary|.)?|Feb(?:ruary|.)?|Mar(?:ch|.)?|Apr(?:il|.)?|May|Jun(?:e|.)?|Jul(?:y|.)?|Aug(?:ust|.)?|Sep(?:tember|.)?|Oct(?:ober|.)?|Nov(?:ember|.)?|Dec(?:ember|.)?)(
,?[
]|-(?:0?[1-9]|[1-2][0-9]|3[01])-)(\d{4}))
Here is an example of what I want it to return.
Many words here 01/07/2019 02/03/2019 02/08/2019 More words here. In this case it should return the date 02/08/2019. How can I change the above statement to locate a date BEFORE a specified word?
I use Notepad ++ to test if that helps determine what type of RegEx I use.
Bonus question: sometimes the word to match on may be on a new line. Can regex still match on that? For example it may be formatted as shown below where the word "More" is on a new line:
Many words here
01/07/2019
02/03/2019
02/08/2019
More words here
You could use a positive lookahead (?=\h+More\b) at the end of your date like pattern to assert what follows is 1+ times a horizontal whitespace char followed by Word and a word boundary.
(?:([0-9]+)/([0-9]+)/([0-9]+)|((0?[1-9]|1[0-2])-(0?[1-9]|[12]\d|3[01])-(\d{4}|\d{2}))|\w+\s\d{2},\s\d{4}|(?i)\b(Jan(?:uary|.)?|Feb(?:ruary|.)?|Mar(?:ch|.)?|Apr(?:il|.)?|May|Jun(?:e|.)?|Jul(?:y|.)?|Aug(?:ust|.)?|Sep(?:tember|.)?|Oct(?:ober|.)?|Nov(?:ember|.)?|Dec(?:ember|.)?)( ,?[ ]|-(?:0?[1-9]|[1-2][0-9]|3[01])-)(\d{4}))(?=\h+More\b)
Regex demo
If the word can be on a newline you could change \h to \s
Regex demo
I am sorry I can't formulate a good question:
This regex should find the word 'period' followed by a whitespace and one digit:
period.*(?=\s[0-9]{1})|alternative
If I input the line TEST 2019 to period 3.csv the regex matches period.
If I input the line TEST period 3 2019.csv the regex matches period 3.
My indtended match is period 3
You can se what I mean from this screenshot from regex101:
For now I have solved it with lookbehind positve like this:
(?<=period\s)[0-9]{1,4}|alternative
This matches the digit after 'period' and I can just add 'period' for my specific purpose. But I don't understand why I get different matches.
You don't need .* after period, so just remove it in from your regex and write it like this
period(?=\s[0-9]{1})|alternative
This matches period literally which is followed by a whitespace and a number (ensured by your positive look ahead). Also you really don't need to write {1} as that's be default and is redundant. Also if you don't want period to match partially in a larger text, use word boundary \b before it and change your regex to this,
\bperiod(?=\s[0-9])|alternative
Demo
Also, your look behind (?<=period\s)[0-9]{1,4}|alternative is not correct for matching the text period and indeed that look behind will just match the number which is preceded by period and one whitespace.
Check this Demo
I know using a simple negative lookbehind
#(?<!first word)\r\nsecond word#s
This will not find second word in
some text
first word
second word
some text
and matches as expected in
some text
second word
some text
It also matches here, but it should not
some text
first word
any other text
second word
some text
How do I need to modify my regular expression to meet the requirements ?
I tried #(?<!first word).*second word#s, but it always matches.
I need this to search through many files in notepad++
Your first regexp is matching 3rd example as if it is looking a string that is not first word and which has a second word as a next string.
The last regexp would match everything because of .* which is matching everything.
I'm suggesting to add a .* in negative lookbehind.
I don't know which editor you are using, so please correct if it's not corresponding to your's regexp syntax.
I would search a maximal long string which has not first word to be proceeded by second word like this
^(?!.*first word.*)\r\nsecond word
I hope it will work.
Good luck!
I'm trying to match the last four characters (alphanumeric) of all words beginning with the sequence &c.
For instance, in the string below, I'd like to match the pieces in bold:
Colour one is &cFF2AC3 and colour two is &c22DE4A.
Can anybody help me with the correct regex expression? I've spent hours on this great resource to no avail.
it looks like hexadecimal numbers, so use this pattern
&c[0-9A-F]{2}\K([0-9A-F]{4})
DEMO
This:
/(?i)\s*&c(?:[a-z0-9]{2})([a-z0-9]{4})\b/
append a g to the end of it if you want it to find all matches in a given text
Try this
/(?:^| )&c\w*(\w{4})\b/
If you want to try it in the regex tester you linked to, make sure to use the g modifier to see all matches.
Explanation: (?:^| ) matches either a space or the start of the string, &c\w* matches the ampersand and the the first however many characters of the word, and then \w{4} captures the last 4 characters. \b on the end asserts a word break (a "non-word" character or the end of the string).
:Statement
Say we have following three records, and we just want to match the first one only -- exactly one digital followed by a specific word, what is the regular expression can be used to make it(in NotePad ++)?
2Cups
11Cups
222Cups
The expressions I tried and their problems are:
Proposal 1:\d{1}Cups
it will find the "1Cups" and "2Cups" substrings in the second and third record respectively, which is what we do not want here
Proposal 2:[^0-9]+[0-9]Cups
same as the above
(PS: the records can be "XX 2Cups", "YY22Cups" and "XYZ 333Cups", i.e., no assumption on the position of the matchable parts)
Any suggestions?
:Reference
[1] The reg definition in NotePad++ (Same as SciTe)
As mentioned in Searching for a complex Regular Expression to use with Notepad++, it is: http://www.scintilla.org/SciTERegEx.html
[2] Matching exact number of digits
Here is an example: regular expression to match exactly 5 digits.
However, we do not want to find the match-able substring in longer records here.
If the string actually has the numbered sequence (1. 2Cups 2. 11Cups), you can use the white space that follows it:
\s\d{1}Cups
If there isn't the numbered list before, but the string will be at the beginning of the line, you can anchor it there:
^\d{1}Cups
Tested in Notepad++ v6.5.1 (Unicode).
It sounds like you want to match the digit only at the start of the string or if it has a space before it, so this would work:
(^|\b)\dCups
Debuggex Demo
Explanation:
(^|\b) Match the start of the string or beginning of a word (technically, word break)
\d Match a digit ({1} is redundant)
Cups Match Cups
This will work:
\b\dCups
If "Cups" must be a whole word (ie not matching 2Cupsizes:
\b\dCups\b
Note that \b matches even if at start or end of input.
I found one possible solution:
Using ^\d{1}Cups to match "Starting with one digital + Cups" cases, as suggested by Ken, Cottrell and Bohemian.
Using [^\d]\dCups to match other cases.
However, haven't found a solution using just one regex to solve the problem yet.
Have a try with:
(?:^|\D)\dCups
This will match xCups only if there aren't digit before.