regex to match pattern followed some string - regex

I have following text. I want to capture the pattern ddd-dd-ddd followed by all text until I again hit a ddd-dd-ddd.
I am trying to use this regex
\b[0-9]{3}-[0-9]{2}-[0-9]{3}\b.*
it matches 982-99-122 followed by the sentence until it hits a line feed. then again the second number 586-33-453 is matched followed by the text on the same line. but it fails to capture the text that continues on the next line.
OR if I remove the line feed from this string, it will only capture the first number 982-99-122 and captures the whole string i.e. does not match the second number 586-33-453
How should I fix both these issues, 1. when line feeds are part of the string and 2. when the string does not have line feeds.
982-99-122 (FCC 333/22) lube oil service pump 1b discharge lube oil service pump
aaa bb dsdsd
586-33-453 Matches exactly 3 times 0-e single character in the range between
dfldfldflkdf 545-66-666 sdkjsl () jdfkjd-kfdkf sdfl
848-99-040 sdsd"" df
dfdf

It seems you want
\b([0-9]{3}-[0-9]{2}-[0-9]{3})\b([\s\S]*?)(?=\b[0-9]{3}-[0-9]{2}-[0-9]{3}\b|$)?
See the regex demo
Details
\b - word boundary
([0-9]{3}-[0-9]{2}-[0-9]{3}) - 3 digits, -, 2 digits, - and 3 digits
\b - word boundary
([\s\S]*?) - Group 2: any 0+ chars, as few as possible
(?=\b[0-9]{3}-[0-9]{2}-[0-9]{3}\b|$)? - a positive lookahead that requires 3 diigts, -, 2 digits, - and 3 digits as a whole word or end of string immediately to the right of the current location.

Related

How to match strings between two words but display also the first word

I want to match specific string between two words and display also the first word. The string between two words is “First Time : 10:10PM”.
I have regex matching between two words but it displays all between >Z and First Time.
https://regex101.com/r/iVBUCQ/1
Data:
qitjfjdjqkfjjf 1934848[*. {*}*}*#*#*#[#*
]*,qgvv]*?£[£?£,£~'_!~£[££<£<'<'? £]!<!<
['~£,'}'<',!']'',', <€~Z1234566789>Z12345667890
1'fncnr'qmtjcsmsj194&($.!:!
,$/&15?'?'(''(('(''158,$3,!!1
1'('(',';?1!( First Time : 10:10PM
1&4$,!;($qmfjccn1'fkfkckcqtngcnnq
AAABBB : ,$2$$(&158((&&,&,&;&(&&((&
Desired Result:
>Z12345667890 First Time : 10:10PM
You may use this regex with 2 capture groups:
\b[AZQ]\d{10,14}(>\S+).*?(First Time : \d\d:\d\d[AP]M)
RegEx Demo
\b[AZQ]\d{10,14}: Match word boundary followed by letter [AZQ] followed by 10 to 14 digits
(>\S+): Capture group #1 to match > followed by 1+ non-whitespace chars
.*?: Match any text or line break
(First Time : \d\d:\d\d[AP]M): Capture group #2 to match First Time : followed by hour:minute and AM or PM
.*?: Match any text or line break
\s: Match a whitespace
AAABBB\b: Match AAABBB and word boundary

RegEx to replace entire string with first two values

I'm trying to come up with a regex expression to replace an entire string with just the first two values. Examples:
Entire String: AO SMITH 100108283 4500W/240V SCREW-IN ELEMENT, 11"
First Two Values: AO SMITH
Entire String: BRA14X18HEBU / P11-042 / 310-470NL BRASS 1/4 x 1/8 HEX
BUSHING
First Two Values: BRA14X18HEBU / P11-042
Entire String: TWO-HOLE PIPE STRAP 4" 008004EG 72E 4
First Two Values: TWO-HOLE PIPE
The caveat is I'm wanting to preserve any kind of special characters and not count them, like "/"'s and "-"'s. The current code I've written does not, instead leaves the new values entirely blank. Only the first example above works.
Here's what I've got so far:
Matching Value:
^(\w+) +(\w+).+$
New Value:
$1 $2
One option could be using a single capture group and use that in the replacement.
^(\w+(?:-\w+)?(?: +\/)? +\w+(?:-\w+)?).+
The pattern matches:
^ Start of string
( Capture group 1
\w+(?:-\w+)?Match 1+ word charss with an optional part to match a - and 1+ word chars
(?: +\/)? Optionally match /
+\w+(?:-\w+)? Match 1+ word charss with an optional part to match a - and 1+ word chars
) Close group 1
.+ Match 1+ times any char (the rest of the line)
If there can be more than 1 hyphen, you can use * instead of ?
Regex demo
Output
AO SMITH
BRA14X18HEBU / P11-042
TWO-HOLE PIPE
A broader match could be matching non word chars in between the words
^(\w+(?:-\w+)*[\W\r\n]+\w+(?:-\w+)*).+
Regex demo

Regex to block more than 3 numbers in a string

I am trying to block any strings that contain more than 3 numbers and prevent special characters. I have the special characters part down. I'm just missing the number part.
For example:
"Hello 1234" - Not Allowed
"Hello 123" - Allowed
I've tried the following:
/^[!?., A-Za-z0-9]+$/
/((^[!?., A-Za-z]\d)([0-9]{3}+$))/
/^((\d){2}[a-zA-Z0-9,.!? ])*$/
The last one is the closest I got as it prevents any special characters and any numbers from being entered at all.
I've looked through previous posts, but am coming up short.
Edit for clarification
Essentially I'm trying to find a way to prevent customers from entering PII on a form. No submission should be allowed that contains more than 3 numbers in a string.
Hello1234 - Not allowed
12345 - Not allowed
1111 - not allowed
No where in the comment section when the user enters the string should there be more than 3 numbers in total.
About the patterns that you tried
^[!?., A-Za-z0-9]+$ The pattern matches 1+ times any of the listed, including 1 or more digits
((^[!?., A-Za-z]\d)([0-9]{3}+$)) If {3}+ is supported, the pattern matches a single char from the character class, 1 digit followed by 3 digits
^((\d){2}[a-zA-Z0-9,.!? ])*$ The pattern repeats 0+ times matching 2 digits and 1 of the listed in the character class
You can use a negative lookahead if that is supported to assert not 4 digits in a row.
^(?!.*\d{4})[a-zA-Z0-9,.!? ]+$
regex demo
If there can not be 4 digits in total, but 0-3 occurrences:
^[a-zA-Z,.!? ]*(?:\d[a-zA-Z,.!? ]*){0,3}$
Explanation
^ Start of string
[a-zA-Z,.!? ]* Match 0+ times any of the listed (without a digit)
(?:\d[a-zA-Z,.!? ]*){0,3} Repeat 0 - 3 times matching a single digit followed by optional listed chars (Again without a digit)
$ End of string
regex demo
If you don't want to match an empty string and a lookahead is supported:
^(?!$)[a-zA-Z,.!? ]*(?:\d[a-zA-Z,.!? ]*){0,3}$
See another regex demo
Here is my two cents:
^(?!(.*\d){4})[A-Za-z ,.!?\d]+$
See the online demo
^ - Start string anchor.
(?! - Open a negative lookahead.
( - Open capture group.
.*\d - Match anything other than newline up to a digit.
){4} - Close capture group and match it 4 times.
) - Close negative lookahead.
[A-Za-z ,.!?\d]+ - 1+ Characters from specified class.
$ - End string anchor.
I think it should cover what you described.
Assuming you mean <= 3 digits, this may be a naive one but how about
[ALLOWED_CHARS]*[0-9]?[ALLOWED_CHARS]*[0-9]?[ALLOWED_CHARS]*[0-9][ALLOWED_CHARS]*?
Fill [ALLOWED_CHARS] to whatever you define is not special character and nums.

Regex: include 3 word in front and 3 behind the selected text

Im using this regex code in excel to find the desired text in a paragraph:
=RegexExtract(B2,"(bot|vehicle|scrape)")
This code will successfully return all 3 of the words if they are found on a paragraph, what I would like to do as an extra is for the regex to return the desired text in bold along with few words in front and 3 words in the back of the selected word.
Example of text:
A car (or automobile) is a wheeled motor vehicle used for transportation.
Most definitions of car say they run primarily on roads, seat one to eight people,
have four tires, and mainly transport people rather than goods.
Example output:
a wheeled motor **vehicle** used for transportation
I want a portion of the text to appear in order for the receiver to be able to pinpoint easier the location of the text.
Any alternative approach is much appreciated.
You may use
=RegexExtract(B2,"(?:\w+\W+(?:\w+\W+){0,2})?(?:bot|vehicle|scrape)(?:\W+\w+(?:\W+\w+){0,2})?")
See the regex demo and the Regulex graph:
Details: The pattern is enclosed with capturing parentheses to make REGEXEXTRACT actually extract the string you need that meets the following pattern:
(?:\w+\W+(?:\w+\W+){0,2})? - an optional sequence of a word followed with non-word chars that is followed with zero, one or two repetitions of 1+ word chars and then 1+ non-word chars
(?:bot|vehicle|scrape) - a bot, vehicle or scrape word
(?:\W+\w+(?:\W+\w+){0,2})? - an optional sequence of 1+ non-word chars and then 1+ word chars followed with zero, one or two repetitions of 1+ non-word chars and then 1+ word chars.
Google Spreadsheets test:

vba regular expression last occurrence

I would like to match the "775" (representing the last 3 digit number with an unkown total number of occurrences) within the string "one 234 two 449 three 775 f4our" , with "f4our" representing an unknown number of characters (letters, digits, spaces, but not 3 or more digits in a row).
I came up with the regular expression "(\d{3}).*?$" thinking the "?" would suffice to get the 775 instead of the 234, but this doesn't seem to work.
Is there any way to accomplish this using VBA regular expressions?
Note that (\d{3}).*?$ just matches and captures into Group 1 the first 3 consecutive digits and then matches any 0+ characters other than a newline up to the end of the string.
You need to get the 3 digit chunk at the end of the string that is not followed with a 3-digit chunk anywhere after it.
You may use a negative lookahead (?!.*\d{3}) to impose a restriction on the match:
\d{3}(?!.*\d{3})
See the regex demo. Or - if the 3 digits are to be matched as whole word:
\b\d{3}\b(?!.*\b\d{3}\b)
See another demo