I would like to see if a string contains minimum two specific words.
eg.:
words to look for: good, day, hello
Match: What a good day
No match: What a day
Match: Hello and good day
So there should be at least two words in the string for the match to be..
For now I have: /(good)|(day)|(hallo)/gmi
but that makes a match if just one word is pressent.
Is that possible?
I guess you can brute force all the combinations along with positive lookaheads:
^(?:(?=.*good)(?=.*day)|(?=.*good)(?=.*hello)|(?=.*day)(?=.*hello)).*$
Notice with lookaheads, the order of the word appearance doesn't matter.
Here's how it looks:
Test cases
Edit
Assuming text like good good is not allowed, and inspired by MichaĆ Turczyn's answer it can be simply as:
^.*(good|day|hello).*(?!\1)(?1).*$
Or if you're using non-pcre regex engine such as javascript
^.*(good|day|hello).*(?!\1)(?:good|day|hello).*$
See the results here
You could try ^.*(?:good|day|hello).*(?:good|day|hello).*$
Explanation:
^ - match beginning of a line
.* - match zero or more of any characters,
(?:...) - non-capturing group
good|day|hello - alternation, match one from list,
$ - match end of a line,
Regex demo
Related
Base string looks like:
repeatedRandomStr ABCXYZ /an/arbitrary/##-~/sequence/of_characters=I+WANT+TO+MATCH/repeatedRandomStr/the/rest/of/strings.etc
The things I know about this base string are:
ABCXYZ is constant and always present.
repeatedRandomStr is random, but its first occurrence is always at the beginning and before ABCXYZ
So far I looked at regex context matching, recursion and subroutines but couldn't come up with a solution myself.
My currently working solution is to first determine what repeatedRandomStr is with:
^(.*)\sABCXYZ
and then use:
repeatedRandomStr\sABCXYZ\s(.*)\srepeatedRandomStr
to match what I want in $1. But this requires two separate regex queries. I want to know if this can be done in a single execution.
In Go, where RE2 library is used, there is no way other than yours: keep extracting the value before the ABCXYZ and then use the regex to match a string between two strings, as RE2 does not and won't support backreferences.
In case the regex flavor can be switched to PCRE or compatible, you can use
^(.*?)\s+ABCXYZ\s(.*)\1
^(.*?)\s+ABCXYZ\s(.*?)\1
See the regex demo.
Details:
^ - start of string
(.*?) - Group 1: zero or more chars other than line break chars as few as possible
\s+ - one or more whitespaces
ABCXYZ - some constant string
\s - a whitespace
(.*) - Group 2: zero or more chars other than line break chars as many as possible
\1 - the same value as in Group 1.
For example, I have these strings
APPLEJUCE1A
APPLETREE2B
APPLECAKE3C
APPLETEA1B
APPLEWINE3B
APPLEWINE1C
I want all of these strings except those that have TEA or WINE1C in them.
APPLEJUCE1A
APPLETREE2B
APPLECAKE3C
APPLEWINE3B
I've already tried the following, but it didn't work:
^APPLE(?!.*(?:TEA|WINE1C)).*$
Any help is appreciated as I'm also kinda new to this.
If you indeed have mutliple strings as you claim, there's no need to jam all that in one regex pattern.
/^APPLE/ && !/TEA|WINE1C/
If you have a single string, the best approach is probably to splice it into lines (split /\n/), but you could also use a single regex match too
/^APPLE(?!.*TEA|WINE1C).*/mg
You can use
^APPLE(?!.*TEA)(?!.*WINE1C).*
See the regex demo.
Details:
^ - start of string
APPLE - a fixed string
(?!.*TEA) - no TEA allowed anywhere to the right of the current location
(?!.*WINE1C) - no WINE1C allowed anywhere to the right of the current location
.* - any zero or more chars other than line break chars as many as possible.
If you don't want to match a string that has both or them (which is not in the current example data):
^APPLE(?!.*(WINE1C|TEA).*(?!\1)(?:TEA|WINE1C)).*
Explanation
^ Start of string
APPLE match literally
(?! Negative lookahead
.*(WINE1C|TEA) Capture either one of the values in group 1
.* Match 0+ characters
(?!\1)(?:TEA|WINE1C) Match either one of the values as long as it is not the same as previously matched in group 1
) Close the lookahead
.* Match the rest of the line
Regex demo
I have the following data:
SOMEDATA .test 01/45/12 2.50 THIS IS DATA
and I want to extract the number 2.50 out of this. I have managed to do this with the following RegEx:
(?<=\d{2}\/\d{2}\/\d{2} )\d+.\d+
However that doesn't work for input like this:
SOMEDATA .test 01/45/12 2500 THIS IS DATA
In this case, I want to extract the number 2500.
I can't seem to figure out a regex rule for that. Is there a way to extract something between two spaces ? So extract the text/number after the date until the next whitespace ? All I know is that the date will always have the same format and there will always be a space after the text and then a space after the number I want to extract.
Can someone help me out on this ?
Capture number between two whitespaces
A whitespace is matched with \s, and non-whitespace with \S.
So, what you can use is:
\d{2}\/\d{2}\/\d{2} +(\S+)
^^^
See the regex demo
The 1+ non-whitespace symbols are captured into Group 1.
If - for some reason - you need to only get the value as a whole match, use your lookbehind approach:
(?<=\d{2}\/\d{2}\/\d{2} )\S+
Or - if you are using PCRE - you may leverage the match reset operator \K:
\d{2}\/\d{2}\/\d{2} +\K\S+
^^
See another demo
NOTE: the \K and a capture group approaches allow 1 or more spaces after the date and are thus more flexible.
I see some people helped you already, but if you would want an alternative working one for some reason, here's what works too :)
.+ \d+\/\d+\/\d+ (\d+[\.\d]*)
So the .+ matches anything plus the first space
then the \d+/\d+/\d+ is the date parsing plus a space
the capturing group is the number, as you can see I made the last part optional, so both floating point values and normal values can be matched. Hope this helped!
Proof: https://regex101.com/r/fY3nJ2/1
Just make the fractal part optional:
(?<=\d{2}\/\d{2}\/\d{2} )\d+(?:\.\d+)?
Demo: https://regex101.com/r/jH3pU7/1
Update following clarifications in comments:
To match anything (but space) surrounded by spaces and prepended by date use:
(?<=\d{2}\/\d{2}\/\d{2} )\S+
Demo: https://regex101.com/r/jH3pU7/3
Rather than capture, you can make your entire match be the target text by using a look behind:
(?<=\d\d(\/\d\d){2} )\S+
This matches the first series of non-whitespace that follows a "date like" part.
Note also the reduction in the length of the "date like" pattern. You may consider using this part of the regex in whatever solution you use.
Maybe this is easy, but i could not find a solution.
I am working in Sas 9.3 with perl regex.
I am searching for a regular Expression, which matches only some words which are not followed by a specific other word. For example, it should match all text where you have "the car" and in all other text after this there should be no "not". (Case can be ignored, because i upcase everything in my code)
Should match
This is not the car i want
The car is green
should not match
The car is not green
This is the car i want, but its not available
One solution would be to split it in two matches:
prxmatch("/The car/",mytext) > 0 and prxmatch("/The car.+not/",mytext)=0
But i have to use the logic a lot of times, also in more complex cases, so i dont want to always use 2 prxmatch and instead combine the logic in one prxmatch.
I read a lot about look aheads and tried some examples, but they did not work correct, e.g.:
"/The Car.+[^(not)]/"
or
"/The Car.+(?!not)/"
or
"/^(?!.*not.*).*?The car.*$/"
1st and second return all 4 texts as results, third none result at all.
So can somebody provide me a solution for this, a simple not Operator for a word or a correct look ahead/behind Approach?
You can use
(?im)^.*\bthe car\b(?!.*\bnot\b).*
The regex demo is available here
Pattern breakdown:
(?im)- enable case-insensitive and multiline matching modes
^ - start of a line (since (?m) is used)
.* - match 0+ any characters but a newline
\bthe car\b - 2 whole words "the car" (a sequence of 2 words)
(?!.*\bnot\b) - a negative lookahead that fails the match if there is a whole word "not" somewhere to the right of the car
.* - the rest of the line up to the newline or end of string
I need some help with a RegEx pattern match.
How do i write a regex if i want it to match
N-NN-N-NN-NN-N-NNN
but also
N-NN-NN-NN
Exmaple:
10pcs- ratchet spanner combination wrench 6-8-10-11-12-13-14-15-17-19
Cr-v,heated 12pcs-1/4dr 4-4.5-5-5.5-6-7-8-9-10-11-12-13 Cr-v,heated
17pcs-1/2dr 10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-27-30
Cr-v,heated 1-2-33 Cr-V heater 1-.2-1-4
It needs to match where they is at least 2 - in the total string. So a phone number like this 020-11223344 is not to be matched.
The strings almost always look like this 6-8-10-11-12-13-14-15-17-19 , except sometimes a . can apper before a number, they also differ in length, is it possible?
I came up with this so far but it also matches on phone numbers and when a . appears it doenst match at all.
(\d-[^>])
On this page you can find the different patters: http://www.cazoom.nl/en/partij-aanbod/186-pcs-working-tools-trolly-3
What about this pattern:
[\d.]+(?:-[\d.]+){2,}
Match [\d.]+ if followed by at least 2x -[\d.]+
(?: Using a non capturing group for repetition.
test at regex101
The following regex will match the thing.
(?:\.?\d\.?\d?-){2,}\.?\d\.?\d?
Debuggex Demo
Just try with following regex:
^\d-\d{2}-\d(\d-\d{2})|(\d-\d{2}-\d-\d{3})$