Regex to find match any combination of 3 terms - regex

I need to design a regex which will match any combination of n words, without duplicates.
E.g. the regex for the words "she" "is" "happy" would match "she is happy", "happy she is" but not "she is is happy" or "she is".
Can I do this with Regex for should I use a custom algorithm?

This match she is happy in any order but not duplicate word:
^(?=(?:(?!\bshe\b).)*\bshe\b(?:(?!\bshe\b).)*$)(?=(?:(?!\bis\b).)*\bis\b(?:(?!\bis\b).)*$)(?=(?:(?!\bhappy\b).)*\bhappy\b(?:(?!\bhappy\b).)*$).*$
DEMO
Let's explain the first part (i.e. (?=(?:(?!\bshe\b).)*\bshe\b(?:(?!\bshe\b).)*$))
This make sure we have one and only one "she" anywhere in the phrase.
(?= # start lookahead
(?: # non capture group
(?!\bshe\b) # negative lookahead, make sure we don't have "she"
. # any character
)* # end group, may appear 0 or more times
\bshe\b # literally "she" surounded by word boundaries
(?: # non capture group
(?!\bshe\b) # negative lookahead, make sure we don't have "she"
. # any character
)* # end group, may appear 0 or more times
$
)
Same explanation for the other words "is" and "happy".

Related

Regex re negative lookahead doesn't exclude multiple characters successfully

There are 5 examples as below, and I am trying to find 3,4,5 while excluding 1,2.
ABC-abc
abc-ABC
ABC-ABC
ABC
vABC-ABC-ABCv
The current expression I use is:
(?!(\w*[A-Z]{2,}-[a-z]+\w*|\w*[a-z]+-[A-Z]{2,}\w*))(\w*-?[A-Z]{2,}-?\w*)
I utilize (\w*-?[A-Z]{2,}-?\w*) to get all possibility of all examples first.
I then use (?!...|...) to put two exclusion conditions.
The first exclusion condition is \w*[A-Z]{2,}-[a-z]+\w* and the second is \w*[a-z]+-[A-Z]{2,}\w*.
This expression works to exclude 1.ABC-abc but not abc-ABC.
I searched a lot and found some people say this way is not something regex is "good" at. Is there any solution or improvement I can do to get rid of abc-ABC.
Appreciate any help or opinion.
As I understand strings are to be rejected if they contain a hyphen that is preceded by a lower-case letter and followed by an upper-case letter, or vice-versa; else they are to be accepted. If so, the following regular expression could be used.
^(?!.*(?:[a-z]-[A-Z]|[A-Z]-[a-z]))
Demo
The regex engine performs the following operations.
^ # match beginning of line
(?! # begin a negative lookahead
.* # match 0+ characters
(?: # begin a non-capture group
[a-z]-[A-Z] # match a lc letter, '-', uc letter
| # or
[A-Z]-[a-z] # match an uc letter, '-', lc letter
) # end non-capture group
) # end negative lookahead

What is regex formula to match two group of words in a body of text

Please show me the formula to match "alpha" OR "beta" if "delta" OR "gamma" is included in the body of text.
Text example:
James is alpha but not gamma but he may also be delta
This should be a match because "alpha" is in the text as well as "gamma".
And I would like it also to have matched because "alpha" is in the text as well as "delta".
The match formula should also apply if "alpha" was replaced by "beta" in the text example.
Depending on your regex flavour, this works for you:
^ # beginning of line
(?= # start lookahead, zero-lengh assertion that make sure we have within a line
.* # 0 or more any character but newline
\b # word boundary
(?: # start non capture group
delta # literally "delta"
| # OR
gamma # literally "gamma"
) # end group
\b # word boundary
) # end lookahead
.* # 0 or more any character but newline
\b # word boundary
( # start group 1
alpha # literally "alpha"
| # OR
beta # literally "beta"
) # end group
\b # word boundary
.* # 0 or more any character but newline
$ # end of line
DEMO
If you need to match the pairs in either order, you can use lookahead assertions:
^(?=.*\b(?:alpha|beta)\b)(?=.*\b(?:gamma|delta)\b).*
Test it live on regex101.com.
Explanation:
Each lookahead checks that one of the two terms is present somewhere in the string. Both lookaheads need to succeed in order for the match to proceed. The .* at the end is not strictly necessary (just to visualize the match in the regex tester); if you only need to check for match/non-match, then you can remove it. In that case, the match result will be an empty string.

Regex code , Python-2 alphanumeric [duplicate]

My regex knowledge is pretty limited, but I'm trying to write/find an expression that will capture the following string types in a document:
DO match:
ADY123
AD12ADY
1HGER_2
145-DE-FR2
Bicycle1
2Bicycle
128D
128878P
DON'T match:
BICYCLE
183-329-193
3123123
Is such an expression possible? Basically, it should find any string containing letters AND digits, regardless of whether the string contains a dash or underscore. I can find the first two using the following two regex:
/([A-Z][0-9])\w+/g
/([0-9][A-Z)\w+/g
But searching for possible dashes and hyphens makes it more complicated...
Thanks for any help you can provide! :)
MORE INFO:
I've made slight progress with: ([A-Z|a-z][0-9]+-*_*\w+) but it doesn't capture strings with more than one hyphen.
I had a document with a lot of text strings and number strings, which I don't want to capture. What I do want is any product code, which could be any length string with or without hyphens and underscores but will always include at least one digit and at least one letter.
You can use the following expression with the case-insensitive mode:
\b((?:[a-z]+\S*\d+|\d\S*[a-z]+)[a-z\d_-]*)\b
Explanation:
\b # Assert position at a word boundary
( # Beginning of capturing group 1
(?: # Beginning of the non-capturing group
[a-z]+\S*\d+ # Match letters followed by numbers
| # OR
\d+\S*[a-z]+ # Match numbers followed by letters
) # End of the group
[a-z\d_-]* # Match letter, digit, '_', or '-' 0 or more times
) # End of capturing group 1
\b # Assert position at a word boundary
Regex101 Demo

Regular expression captures unwanted string

I have created the following expression: (.NET regex engine)
((-|\+)?\w+(\^\.?\d+)?)
hello , hello^.555,hello^111, -hello,+hello, hello+, hello^.25, hello^-1212121
It works well except that :
it captures the term 'hello+' but without the '+' : this group should not be captured at all
the last term 'hello^-1212121' as 2 groups 'hello' and '-1212121' both should be ignored
The strings to capture are as follows :
word can have a + or a - before it
or word can have a ^ that is followed by a positive number (not necessarily an integer)
words are separated by commas and any number of white spaces (both not part of the capture)
A few examples of valid strings to capture :
hello^2
hello^.2
+hello
-hello
hello
EDIT
I have found the following expression which effectively captures all these terms, it's not really optimized but it just works :
([a-zA-Z]+(?= ?,))|((-|\+)[a-zA-Z]+(?=,))|([a-zA-Z]+\^\.?\d+)
Ok, there are some issues to tackle here:
((-|+)?\w+(\^.?\d+)?)
^ ^
The + and . should be escaped like this:
((-|\+)?\w+(\^\.?\d+)?)
Now, you'll also get -1212121 there. If your string hello is always letters, then you would change \w to [a-zA-Z]:
((-|\+)?[a-zA-Z]+(\^\.?\d+)?)
\w includes letters, numbers and underscore. So, you might want to restrict it down a bit to only letters.
And finally, to take into consideration of the completely not capturing groups, you'll have to use lookarounds. I don't know of anyway otherwise to get to the delimiters without hindering the matches:
(?<=^|,)\s*((-|\+)?[a-zA-Z]+(\^\.?\d+)?)\s*(?=,|$)
EDIT: If it cannot be something like -hello^2, and if another valid string is hello^9.8, then this one will fit better:
(?<=^|,)\s*((?:-|\+)?[a-zA-Z]+|[a-zA-Z]+\^(?:\d+)?\.?\d+)(?=\s*(?:,|$))
And lastly, if capturing the words is sufficient, we can remove the lookarounds:
([-+]?[a-zA-Z]+|[a-zA-Z]+\^(?:\d+)?\.?\d+)
It would be better if you first state what it is you are looking to extract.
You also don't indicate which Regular Expression engine you're using, which is important since they vary in their features, but...
Assuming you want to capture only:
words that have a leading + or -
words that have a trailing ^ followed by an optional period followed by one or more digits
and that words are sequences of one or more letters
I'd use:
([a-zA-Z]+\^\.?\d+|[-+][a-zA-Z]+)
which breaks down into:
( # start capture group
[a-zA-Z]+ # one or more letters - note \w matches numbers and underscores
\^ # literal
\.? # optional period
\d+ # one or more digits
| # OR
[+-]? # optional plus or minus
[a-zA-Z]+ # one or more letters or underscores
) # end of capture group
EDIT
To also capture plain words (without leading or trailing chars) you'll need to rearrange the regexp a little. I'd use:
([+-][a-zA-Z]+|[a-zA-Z]+\^(?:\.\d+|\d+\.\d+|\d+)|[a-zA-Z]+)
which breaks down into:
( # start capture group
[+-] # literal plus or minus
[a-zA-Z]+ # one or more letters - note \w matches numbers and underscores
| # OR
[a-zA-Z]+ # one or more letters
\^ # literal
(?: # start of non-capturing group
\. # literal period
\d+ # one or more digits
| # OR
\d+ # one or more digits
\. # literal period
\d+ # one or more digits
| # OR
\d+ # one or more digits
) # end of non-capturing group
| # OR
[a-zA-Z]+ # one or more letters
) # end of capture group
Also note that, per your updated requirements, this regexp captures both true non-negative numbers (i.e. 0, 1, 1.2, 1.23) as well as those lacking a leading digit (i.e. .1, .12)
FURTHER EDIT
This regexp will only match the following patterns delimited by commas:
word
word with leading plus or minus
word with trailing ^ followed by a positive number of the form \d+, \d+.\d+, or .\d+
([+-][A-Za-z]+|[A-Za-z]+\^(?:.\d+|\d+(?:.\d+)?)|[A-Za-z]+)(?=,|\s|$)
Please note that the useful match will appear in the first capture group, not the entire match.
So, in Javascript, you'd:
var src="hello , hello ,hello,+hello,-hello,hello+,hello-,hello^1,hello^1.0,hello^.1",
RE=/([+-][A-Za-z]+|[A-Za-z]+\^(?:\.\d+|\d+(?:\.\d+)?)|[A-Za-z]+)(?=,|\s|$)/g;
while(RE.test(src)){
console.log(RegExp.$1)
}
which produces:
hello
hello
hello
+hello
-hello
hello^1
hello^1.0
hello^.1

Check the specific number of occurrences of a single character in a string with regex

I'm trying to create a regex pattern for my powershell code. I've never worked with regex before, so I'm a total noob.
The regex should check if there are two points in the string.
Examples that SHOULD work:
3.1.1
5.10.12
10.1.15
Examples that SHOULD NOT work:
3
3.1
5.10.12.1
The string must have two points in it, the number of digits doesn't matter.
I've tried something like this, but it doesn't really work and I think its far from the right solution...
([\d]*.[\d]*.[\d])
In your current regex I think you could escape the dot \. or else the dot would match any character.
You could add anchors for the start ^ and the end $ of the string and update your regex to ^\d*\.\d*\.\d*$
That would also match ..4 and ..
Or if you want to match one or more digits, I think you could use ^\d+(?:\.\d+){2}$
That would match
^ # From the beginning of the string
\d+ # Match one or more digits
(?: # Non capturing group
\.\d+ # Match a dot and one or more ditits
){2} # Close non capturing group and repeat 2 times
$ # The end of the string
Use a lookahead:
^\d(?=(?:[^.]*\.[^.]*){2}$)[\d.]*$
Broken down, this says:
^ # start of the line
\d # at least one digit
(?= # start of lookahead
(?:[^.]*\.[^.]*){2} # not a dot, a dot, not a dot - twice
$ # anchor it to the end of the string
)
[\d.]* # only digits and dots, 0+ times
$ # the end of the string
See a demo on regex101.com.