Regular Expression to Capture First Two Lines That Don't Include String - regex

I am struggling to find a method to extract the first two lines of an address using a regular expression, where it doesn't include the word "Account".
If we take this address:
Company Name Some Road Some Town
I can use the regular expression (?:.*\s*){2} to return
Company Name Some Road
Which is great.
However, if there is an extra line at the top, making the address become:
Accounts Payable Company Name Some Road Some Town
Then it no longer picks up those two lines that I want.
I have tried the method here: Regular expression to match a line that doesn't contain a word? without success, and have also tried combinations of using things like (?!Account.*)(?:.*\s*){3}, but am having little success.
The Microsoft website https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference has masses of characters etc to use, but I haven't managed to get a combination working yet.
The closest I've got was using [^Account.*](?:.*\s*){3} which returns
s Payable
Company Name
Some Road
I just can't get it to remove the rest of that line! Any help would be appreciated. Thanks.

You may use a ^ with multiline mode on:
(?m)^(?!Accounts)(?:.*\n?){2}
Or (a bit more efficient and following best practices):
(?m)^(?!Accounts).*(?:\n.*)?
See the regex demo and this regex demo.
When (?m) is added to the pattern, ^ matches start of a line, and the whole pattern matches
^ - start of a line
(?!Accounts) - with no Accounts as the first word
(?:.*\n?){2} - two occurrences of any 0+ chars other than line break chars followed with an optional newline
.*(?:\n.*)? - matches a line and an optional subsequent line.

Related

regex challenge

I've got a little challenge that's bodering me for past 2 days.
I've have to chech if "From:" and "X-Sender:" have the same value using RegEx
Problem:
From: some text
<someone#mail.com>
X-Sender: notthatmail.com
How colud RegEx perform check if those two mails are matching?
This is actually a mail where I have to look form mail consistency of Mime headers.
You can use this:
From: .+?<(.+?)>.+?X-Sender: \1\b
If it matches, the two emails are the same.
Note that this requires the single line option to be on. If your regex flavour does not have a single line option, you can replace all the . with [\s\S] to achieve the same effect.
How this works:
It first finds the the email address in the <> brackets, captures it into group 1. And the it continues to look for the word X-Sender:. And then it asserts that there must be whatever is in group 1 (\1) after the word X-Sender:.
Demo

Regex to MATCH number string (with optional text) in a sentence

I am trying to write a regex that matches only strings like this:
89-72
10-123
109-12
122-311(a)
22-311(a)(1)(d)(4)
These strings are embedded in sentences and sometimes there are 2 potential matches in the sentence like this:
In section 10-123 which references section 122-311(a) there is a phone number 456-234-2222
I do not want to match the phone. Here is my current working regex
\d{2,3}\-\d{2,3}(\([a-zA-Z0-9]\))*
see DEMO
I've been looking on Stack and have not found anything yet. Any help would be appreciated. Will be using this in a google sheet and potentially postgres.
Based on regex, suggested by #Wiktor Stribiżew:
=REGEXEXTRACT(A1,REPT("\b(\d{2,3}-\d{2,3}\b(?:\([A-Za-z0-9]\))*)(?:[^-]|$)(?:.*)",LEN(REGEXREPLACE(REGEXREPLACE(A1,"\b(\d{2,3}-\d{2,3}\b(?:\([A-Za-z0-9]\))*)(?:[^-]|$)", char (9)),"[^"&char(9)&"]",""))))
The formula will return all matches.
String:
A
In 22-311(a)(1)(d)(4) section 10-123 which ... 122-311(a) ... number 456-234-2222
Output:
B C D
22-311(a)(1)(d)(4) 10-123 122-311(a)
Solution
To extract all matches from a string, use this pattern:
=REGEXEXTRACT(A1,
REPT(basic_regex & "(?:.*)",
LEN(REGEXREPLACE(REGEXREPLACE(A1,basic_regex, char (9)),"[^"&char(9)&"]",""))))
The tail of a function:
LEN(REGEXREPLACE(REGEXREPLACE(A1,basic_regex, char (9)),"[^"&char(9)&"]","")))
is just for finding number 3 -- how many entries of a pattern in a string.
To not match the phone number you have to indicate that the match must neither be preceded nor followed by \d or -. Google spreadsheet uses RE2 which does not support look around assertion (see the list of supported feature) so as far as I can tell, the only solution is to add a character before and after the match, or the string boundary:
(?:^|[^-\d])\d{2,3}\-\d{2,3}(\([a-zA-Z0-9]\))*(?:$|[^-\d])
(?:^|[^-\d]) means either the start of a line (^) or a character that is not - or \d (you might want to change that, and forbid all letters as well). $ is the end of a line. ^ and $ only do what you want with the /m flag though
As you can see here this finds the correct strings, but with additional spaces around some of the matches.

Extracting address with Regex

I'm trying to looking for Street|St|Drive|Dr and then get all the contents of the line to extract the address:
(?:(?!\s{2,}|\$).)*(Street|St|Drive|Dr).*?(?=\s{2,})
.. but it also matches:
Full match 420-442 ` Tax Invoice/Statement`
Group 1. 433-435 `St`
Full match 4858-4867 `163.66 DR`
Group 1. 4865-4867 `DR`
Full match 11053-11089 ` Permanent Water Saving Plan, please`
Group 1. 11077-11079 `Pl`
How do i match only whole words and not substrings so it ignores words that contain those words (the first match for example).
One option is to use the the word-boundary anchor, \b, to accomplish this:
(?:(?!\s{2,}|\$).)*\b(Street|St|Drive|Dr)\b.*?(?=\s{2,})
If you provide an example of the raw text you're parsing, I'll be able to give additional help if this doesn't work.
Edit:
From the link you posted in a comment, it seems that the \b solution solves your question:
How do i match only whole words and not substrings so it ignores words that contain those words (the first match for example).
However, it seems like there are additional issues with your regex.

How to exclude a certain word in regex?

I'm using this expression and it's perfect for what I need:
.*(cq|conquest).*
It returns any word/phrase/sentence/etc. with the letters 'cq' or the word 'conquest' in it. However, from those matches I want to exclude all that contain the term 'conquest power'.
Examples:
some conquest here (should match)
another cq with some conquest here (should match)
too much cq or conquest power is bad (should not match)
How can I do that to the regex above? It has to be only one regex otherwise the program that I'm using (Advanced Combat Tracker) will create two different tabs.
If you want to match any string which contains "conquest" or "cq", but not if the string contains "conquest power", then the regex is
^(?!.*conquest power).*?(?:cq|conquest).*
The above will attempt to match from the start of the string to the end of the line, if you want to match from the start of each line, switch on multiline mode if available - adding (?m) to the start of the regex may do that.
If you want to match across newlines change . to [\s\S], or switch on singleline mode if available.
You have confused people by stating "I want to match 'cq' or 'conquest'" but also "I want the regex to extract that line".
I assume you don't really want to match just "cq" or "conquest", you want to match strings/lines (?) containing "cq" or "conquest".
From your original question I got that you want to match all strings which contain "cq" or "conquest" but do not contain "power". For this case the following regexp works:
^([^p]|p(?!ower))*(cq|conquest)([^p]|p(?!ower))*$
(regexpal)

Regex to match "Warm Regards"-type email signatures

I am an absolute regex noob and have been banging my head against the wall trying to write a regex to remove email signatures from a string that look like this:
Hi There, this is an email.
Warm Regards,
Joe Bloggs
Thus far, I’ve tried variations on:
/^[\w |][R|r]egards,/
The regex should:
look at the beginning of the line (what I was aiming for with the ^,
cover variations like “Warm Regards”, “Kind Regards”, “Best Regards”, and plain old “Regards” (which I was hoping to accomplish with the [\w |] to match any word or blank and the [R|r] to cover Regards/regards),
be OK with mixed case like “warm regards” or “Warm Regards”, and
only pickup lines that are [word] Regards or just regards, so that we don’t grab email body that has the word “regards” somewhere in it.
This seems elementary, but I just can’t nail it, and I seem to err on broadening my regex too much such that any line that contains “regards” gets picked up. I’m doing this in Node.js combined with the string.search function if that matters.
This seems to fit all your requirements:
^(\w*\s)?[r|R]egards,?
Has to start on a new line, then can have any word followed by a space, and the word regards, or just the word regards, with the comma also being optional.
If you want to wipe out everything after the regards line as well you can add in \s*.*
^(\w*\s)?[r|R]egards,?\s*.*
If you are trying to remove everything from the Warm Regards line on, this should do it
^[^<]*?(?=(.*)[R|r]egards)
Try the following regular expression
^\w* ?regards,?
with the case insensitive & global flag specified.
You can see the regular expression explanation and what it matches here: http://regex101.com/r/vR3zG5
The regular expression that matches signatures defined in #1-#4 is following:
/^(\w+ +)?regards,? *$/im
How it works:
"^" in the beginning means new line
"(\w+ +)?" means optional segment that contains exactly one word followed by at least one space
"regards" is just a simple match
",?" optional comma at the end
" *" - the line may contain trailing spaces (it may be useful to put the same match after ^)
"$" - end of line
/.../i - means that the expression is case-insensitive
/.../m - means that ^ and $ match at line breaks