Regex to match unique characters NOT in a set

Regex to match unique characters NOT in a set - regex

I'd like to match unique characters that are NOT "ymd"
example 1 :
mm-dd-yyyy should match only 1 character -
example 2 :
d. m. y. should match only 1 . character and 1 whitespace character
I've tried negative lookahead using this pattern
/([^ymd]+\b)(?!.*\1\b)/
which works, but the match for the example 2 is ". "
Ideally, I'd like it to find 2 single character matches : "." and 1 whitespace character

First, simply match single characters. Be sure to put them in a group. This will make all non-ymd characters match individually:
([^ymd])
Then, use a negative lookahead. This will make only the last unique character match:
(?!.*\1)
Full solution:
([^ymd])(?!.*\1)
See it live!

Related

How to match lines in a numbered list with a regex

I want to search for all lines that:
start with a numeric-repeat (one or several times)
this numeric-repeat is not followed by dot and a whitespace character
either a single dot after the numeric-repeat or a letter is okay
Given Lines
1. TEST 1 : DataLogFile
11. TEST 2 : Inter Citro File
111. TEST 3 : Inter Citro File
111.TEST4 : Match this
111TEST4 : Match this
Expected Result
Should only match last 2 lines
111.TEST4 : Match this
111TEST4 : Match this
1. Regex
I try with regex ^[0-9]+(?!. ).* to match only the last row because there is no whitespace character after the dot.
Tested in Regex101
1. Actual Result
Matched 4 last lines
11. TEST 2 : Inter Citro File
111. TEST 3 : Inter Citro File
111.TEST4 : Match this
111TEST4 : Match this
2. Regex like answered
When I try the SaSkY first response ^\d+\.\S.*,
it will only match lines that have digits, then dot, then no blank, then characters. See Demo
But for input without a dot after digits it will not match.
Although expected to match also 111TEST4 : Match this.

Try this:
^\d+(?:\.\S|[A-Za-z]).*
^ start of the line.
\d+ one or more digits.
(?:\.\S|[A-Za-z]) non-capturing group:
\. a literal dot ..
\S any character except a whitespace character.
| OR.
[A-Za-z] a letter.
.* zero or more characters.
See regex demo

You can try:
^(\d)\1*+(?!\.?\s+).*$
Regex demo.
Or if you want just a number at the beginning (not repeating numbers such as 111):
^\d++(?!\.?\s+).*$

You should have stated your expectations clearly before asking.
If you like to
match: any "identifier" or word that is either prefixed with a number (e.g. 1Hello) or is prefixed with an ordinal (e.g. 2.World)
But not: a phrase containing space like in a numbered list entry (e.g. 1. Hello
Simple regex sequentially built
Then ^\d+\.?[a-zA-Z].*
Matches:
111.TEST4 : Match this
111TEST5: Match this
111test6: Match this
But not those numbered-list items having separating spaces inside.
It also does not match anything starting with a letter.
Those do not match:
1. TEST 1 : DataLogFile
11. TEST 2 : Inter Citro File
111. TEST 3 : Inter Citro File
test7: should not match
💡️ So you can apply this regex on lines to filter for poorly formatted numbered-list entries.
See demo
Explained the sequence
^ begin of line
\d+ at least one or more digits (a number)
\.? an optional dot (raw dots need to be escaped by backslash!)
[a-zA-Z] any alphabetic letter from the range (lower or uppercase)
.* anything else (here the unescaped dot has special meaning "any character")

How to conditionally expect particular characters if a prior regex matched?

I want to expect some characters only if a prior regex matched. If not, no characters (empty string) is expected.
For instance, if after the first four characters appears a string out of the group (A10, B32, C56, D65) (kind of enumeration) then a "_" followed by a 3-digit number like 123 is expected. If no element of the mentioned group appears, no other string is expected.
My first attempt was this but the ELSE branch does not work:
^XXX_(?<DT>A12|B43|D14)(?(DT)(_\d{1,3})|)\.ZZZ$
XXX_A12_123.ZZZ --> match
XXX_A11.ZZZ --> match
XXX_A12_abc.ZZZ --> no match
XXX_A23_123.ZZZ --> no match
These are examples of filenames. If the filename contains a string of the mentioned group like A12 or C56, then I expect that this element if followed by an underscore followed by 1 to 3 digits. If the filename does not contain a string of that group (no character or a character sequence different from the strings in the group) then I don't want to see the underscore followed by 1 to 3 digits.
For instance, I could extend the regex to
^XXX_(?<DT>A12|B43|D14)_\d{5}(?(DT)(_\d{1,3})|)_someMoreChars\.ZZZ$
...and then I want these filenames to be valid:
XXX_A12_12345_123_wellDone.ZZZ
XXX_Q21_00000_wellDone.ZZZ
XXX_Q21_00000_456_wellDone.ZZZ
...but this is invalid:
XXX_A12_12345_wellDone.ZZZ
How can I make the ELSE branch of the conditional statement work?
In the end I intend to have two groups like
Group A: (A11, B32, D76, R33)
Group B: (A23, C56, H78, T99)
If an element of group A occurs in the filename then I expect to find _\d{1,3} in the filename.
If an element of group B occurs ion the filename then the _\d{1,3} shall be optional (it may or may not occur in the filename).
I ended up in this regex:
^XXX_(?:(?A12|B43|D14))?(?(DT)(_\d{5}_\d{1,3})|(?!(?&DT))(?!.*_\d{3}(?!\d))).*\.ZZZ$
^XXX_(?:(?<DT>A12|B43|D14))?_\d{5}(?(DT)(_\d{1,3})|(?!(?&DT))(?!.*_\d{3}(?!\d))).+\.ZZZ$
Since I have to use this regex in the OpenApi #Pattern annotation I have the problem that I get the error:
Conditionals are not supported in this regex dialect.
As #The fourth bird suggested alternation seems to do the trick:
XXX_((((A12|B43|D14)_\d{5}_\d{1,3}))|((?:(A10|B10|C20)((?:_\d{5}_\d{3})|(?:_\d{3}))))).*\.ZZZ$

The else branch is the part after the |, but if you also want to match the 2nd example, the if clause would not work as you have already matched one of A12|B43|D14
The named capture group is not optional, so the if clause will always be true.
What you can do instead is use an alternation to match either the numeration part followed by an underscore and 3 digits, or match an uppercase char and 2 digits.
^XXX_(?:(?<DT>A12|B43|D14)_\d{1,3}|[A-Z]\d{2})\.ZZZ$
Regex demo
If you want to make use of the if/else clause, you can make the named capture group optional, and then check if group 1 exists.
^XXX_(?<DT>A12|B43|D14)?(?(DT)_\d{1,3}|[A-Z]\d{2})\.ZZZ$
Regex demo
For the updated question:
^XXX_(?<DT>A12|B43|D14)?(?(DT)(?:_\d{5})?_\d{3}(?!\d)|(?!A12|B43|D14|[A-Z]\d{2}_\d{3}(?!\d))).*\.ZZZ$
The pattern matches:
^ Start of string
XXX_ Match literally
(?<DT>A12|B43|D14)?
(?(DT) If we have group DT
(?:_\d{5})? Optionally match _ and 5 digits
_\d{3}(?!\d) Match _ and 3 digits
| Or
(?! Negative lookahead, assert not to the right
A12|B43|D14| Match one of the alternatives, or
[A-Z]\d{2}_\d{3}(?!\d) Match 1 char A-Z, 2 digits _ 3 digits not followed by a digit
) Close lookahead
) Close if clause
.* Match the rest of the line
\.ZZZ Match . and ZZZ
$ End of string
Regex demo

Regex for a string with alpha numeric containing a '.' character

I have not been able to find a proper regex to match any string not starting and ending with some condition.
This matches
AS.E
23.5
3.45
This doesn't match
.263
321.
.ASD
The regex can be alpha-numeric character with optional '.' character and it has to be with in range of 2-4(minimum 2 chars & maximum 4 chars).
I was able to create one ->
^[^\.][A-Z|0-9|\.]{2,4}$
but with this I couldn't achieve mask '.' character at the end of regex.
Thanks.

Maybe not the most optimized but a working one. Created step by step:
The first character should be alphanumeric
^[a-zA-Z0-9]
0, 1 or 2 character alphanumeric or . but not matching end of string
[a-zA-Z0-9\.]{0,2}
an alphanumeric character matching end of string
[a-zA-Z0-9]$
Concatenate all of this to obtain your regex
^[a-zA-Z0-9][a-zA-Z0-9\.]{0,2}[a-zA-Z0-9]$
Edit: This regex allows multiple dots (up to 2)

If I guessed correctly, you want to match all words that are
Between 2 and 4 characters long ...
... and start and end with a character from [A-Z0-9] ...
... and have characters from [A-Z0-9.] in the middle ...
... and are not preceded or followed by a ..
Try this regex to match all these substrings in a text:
(?<=^|[^.])[A-Z0-9][A-Z0-9.]{0,2}[A-Z0-9](?=$|[^.])
However, note that this will match the AA in .AAAA.. If you don't want this match, then please give more details on your requirements.
When you are only interested in the number of matches, but not the matched strings, then you could use
(^|[^.])[A-Z0-9][A-Z0-9.]{0,2}[A-Z0-9]($|[^.])
If you have one string, and want to know whether that string completely matches or not, then use
^[A-Z0-9][A-Z0-9.]{0,2}[A-Z0-9]$
If there may be at most one . inside the match, replace the part [A-Z0-9.]{0,2} with ([A-Z0-9]?[A-Z0-9.]?|[A-Z0-9.]?[A-Z0-9]?).

You can use this pattern to match what you say,
^[^\.][a-zA-Z0-9\.]{2,4}[^\.]$
Check the result here..
https://regex101.com/r/8BNdDg/3

Why doesn't the regex ^([0|1]1)+$ match the string "111"?

I'm trying to write a regex to match binary strings where every odd character is a 1.
I came up with this:
^([0|1]1)+$
My logic:
^ matches the start of the line
( starts a capture group
[0|1] match a 0 or 1 (since the 0th position is even)
1 the previous character (0 or 1) must be followed by a 1
+ repeat the previous pattern one or more times
$ matches the end of the line
So by my logic, it the above regex should match binary strings where every other character (with the first "other" character being the second one in the string) is a 1.
However, it doesn't work correctly. As an example, the string 111 is not matched.
Why isn't it working and what should I change to make it work?
Regex101 Test

If you need every odd character to be a 1, then you need something more like this:
^([01]1)*[01]?$
The first character can be anything, the next has to be 1, then repeated several times while the last character can be 0 or 1.
The pipe in your character class is not needed, and is actually making your regex also match a pipe character. So remove it entirely. You use the pipe in groups (i.e. (?: ... ) or ( ... ) to denote alternation).
The above will also match an empty string, so you could add (?=.) at the beginning to force matching at least 1 character (i.e. ^(?=.)([01]1)*[01]?$.
The above will match where you have (where x is either 0 or 1):
x
x1
x1x
x1x1
x1x1x
x1x1x1
etc.
Your current regex on the other side is attempting to match even number of characters. You repeat the group ([0|1]1) which matches 2 characters exactly (no more no less) so the length of your whole match will be a multiple of 2.
Adding the optional [01] at the end allows for strings with odd number of characters to match.

Your regex is for even-length strings only. [01] and 1 each match a character, therefore your capturing group matches 2 characters.
This modifies your regex to accept odd-length strings:
^([01](1|$))+$

Firstly, the [0|1] should read [01]. Otherwise you have a character group that matches, 0, | or 1.
Now, [01]1 matches exactly two characters. Thus ([01]1)+ cannot match a string whose length is not a multiple of two.
To make it work with inputs of odd length, change the regex to
^(([01]1)+[01]?|1)$

You can use this pattern:
^1?([01]1)+$|^1$
or
^(1?([01]1)+|1)$
To deal with an odd or even number of digits you need to put an optional 1? at the begining. To ensure that there is at least one digit, you can't use a * quantifier for the group, otherwhise the pattern can match the empty string. This why, you need to use + for the group and add the case of a single 1

Regular expression matching specific letter combos

I need to match the following example strings:
LA20517505
BN30116471
I tried this: [LA|BN].\d{8}
That does indeed match, but it also matches other letters as well. I specifically need to match "LA" or "BN" followed by 8 numbers.

Don't use brackets here but parenthesis : (LA|BN)\d{8}
Explanation:
(LA|BN) Match character sequences LA or BN
\d{8} followed by 8 digits
whereas the initial regex [LA|BN].\d{8} can be read as :
[LA|BN] Match either character L,A,|,B or N
. Match any character
\d{8} followed by 8 digits

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex to match unique characters NOT in a set - regex

First, simply match single characters. Be sure to put them in a group. This will make all non-ymd characters match individually: ([^ymd]) Then, use a negative lookahead. This will make only the last unique character match: (?!.\1) Full solution: ([^ymd])(?!.\1) See it live!

Related

How to match lines in a numbered list with a regex

How to conditionally expect particular characters if a prior regex matched?

Regex for a string with alpha numeric containing a '.' character

Why doesn't the regex ^([0|1]1)+$ match the string "111"?

Regular expression matching specific letter combos

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex to match unique characters NOT in a set - regex

First, simply match single characters. Be sure to put them in a group. This will make all non-ymd characters match individually: ([^ymd]) Then, use a negative lookahead. This will make only the last unique character match: (?!.*\1) Full solution: ([^ymd])(?!.*\1) See it live!

Related

How to match lines in a numbered list with a regex

How to conditionally expect particular characters if a prior regex matched?

Regex for a string with alpha numeric containing a '.' character

Why doesn't the regex ^([0|1]1)+$ match the string "111"?

Regular expression matching specific letter combos

Categories

Resources

First, simply match single characters. Be sure to put them in a group. This will make all non-ymd characters match individually: ([^ymd]) Then, use a negative lookahead. This will make only the last unique character match: (?!.\1) Full solution: ([^ymd])(?!.\1) See it live!