Regex matching zero or none and Or - regex

How do I match a sequence of optional characters or a different character?
For example:
I started with matching the letters "KQkq"
these are in sequence but optional, so "K?Q?k?q?"
however the input is either one of those four letters or "-", so I tried "(K?Q?k?q?|-)"
this works for the letters, but won't match the "-"
If the letters weren't optional I'd use "(KQkq|-)", which works fine.
I've tried a number of different things, like putting the letters in a group "((K?Q?k?q?)|-)" but I can't quite find a way to express what I need.
*** Note: As I stated in the question: I'm matching the letters "KQkq" "in sequence but optional". Sequence means they come one after the other so "KQkq" is valid, "KkQq" is not valid, nor is "kqKQ" or "kkkk" or anything else that doesn't match the sequence KQkq. Optional means that a character may or may not exist. So "KQkq" is valid, as is "K" or "Kk" or "Qkq". Character classes, for those that don't know, will match any of the characters in the class with no sense of sequence. So [KQkq]{1,4} would indeed match "KQkq" and "Qkq" however it would also match "KKKK", "qkQK", "qqqq" none of which are valid.

^(?:(?:K?Q?k?q?)|-)$
Try this.See demo.
https://regex101.com/r/gQ3kS4/2

Your regex is working fine, in order to capture the dash you just need to anchor the regex:
^(K?Q?k?q?|-)$
Without anchor, the first part K?Q?k?q? matches anything, included empty string and -.

Have you tried doing ([KQkq]|-) or even ([KQkq]|[-])
Example: Regex Example

Try using square brackets, like this: /[KQkq]|-/. Anything inside square brackets is optional. It literally means match anything between the brackets.

I think this will do what you need: (K?Q?k?q?|-)

Related

if else failing regex

I want to write a regex to find if word is of 3 characters length, but preceding by m_ is optional. In that case m_ followed by minimum of 3 characters is required.
Basically I want to match
Abc or m_abc and dont match ab or m_ac
(^(m_))?([a-zA-Z0-9_]{3,})|(^[a-zA-Z0-9_]{3,}$)
I tried an if loop but it is matching the text m_a also.
Can you please help me what I am missing here
Maybe I wrote my regex wrong.
I want something like
if(m_ found)
"followed by 3 characters required"
else
"Look if total number of characters is 3"
Thanks.
You could use this regular expression, which either requires the m_ at the start or forbids it (by negative look-ahead):
^(m_|(?!m_))\w{3,}$
See regex tester
If negative look-head is not a feature you can use, then you could go for this more elaborate regex, which goes through the different options for the first two characters:
^(m_\w{3,}|m[A-Za-z0-9]\w+|[A-Za-ln-z0-9]\w{2,})$
See regex tester
Do you want your 3-character word to be able to have underscores in it? Because if not, then you can change [a-zA-Z0-9_] to [a-zA-Z0-9], and it should not match m_a in that case. And unless you want to match numerals, you can simplify it further to [a-zA-z].
The main error is that the ^ line anchor is inside the optional parenthesized expression. You want beginning of line unconditionally, followed by an optional m_?.
You can simplify the rest significantly. Three or more is captured by an expression which requires three characters; the regex will succeed at that point, whether or not you are at the end of the input.
^(m_)?[a-zA-Z0-9_]{3}
The underscore in the character class seems somewhat dubious. Do you really intend for a "word" to include underscores? Then m_ac will also match, because it is at least three characters long and consists of characters in the set, even though you say it is specifically disallowed.

Match pattern anywhere in string?

I want to match the following pattern:
Exxxx49 (where x is a digit 0-9)
For example, E123449abcdefgh, abcdefE123449987654321 are both valid. I.e., I need to match the pattern anywhere in a string.
I am using:
^*E[0-9]{4}49*$
But it only matches E123449.
How can I allow any amount of characters in front or after the pattern?
Remove the ^ and $ to search anywhere in the string.
In your case the * are probably not what you intended; E[0-9]{4}49 should suffice. This will find an E, followed by four digits, followed by a 4 and a 9, anywhere in the string.
I would go for
^.*E[0-9]{4}49.*$
EDIT:
since it fullfills all requirements state by OP.
"[match] Exxxx49 (where x is digit 0-9)"
"allow for any amount of characters in front or after pattern"
It will match
^.* everything from, including the beginning of the line
E[0-9]{4}49 the requested pattern
.*$ everthing after the pattern, including the the end of the line
Your original regex had a regex pattern syntax error at the first *. Fix it and change it to this:
.*E\d{4}49.*
This pattern is for matching in engines (most engines) that are anchored, like Java. Since you forgot to specify a language.
.* matches any number of sequences. As it surrounds the match, this will match the entire string as long as this match is located in the string.
Here is a regex demo!
Just simply use this:
E[0-9]{4}49
How do I allow for any amount of characters in front or after pattern? but it only matches E123449
Use global flag /E\d{4}49/g if supported by the language
OR
Try with capturing groups (E\d{4}49)+ that is grouped by enclosing inside parenthesis (...)
Here is online demo

Exclude strings of pattern "abba"

For example, I want to exclude 'fitting', 'hollow', 'trillion'
but not 'hello' or 'pattern'
I already got the following to work
(.)(.)\2\1
which matches 'hollow' or 'fitting', but I have trouble negating this.
the closest thing I get is
^.(?!(.)(.)\2\1)
which excludes 'fitting' and 'hollow' but not 'trillion'
It's a little different from what you have. Your current regex will check for the pallindromicity (?) as of the second character. Since you want to check the whole string, you need to change it a little to:
^(?!.*(.)(.)\2\1)
The first anchor will ensure that the check is made only at the beginning (otherwise, the regex can claim a match at the end of the string).
Then the .* within the negative lookahead will enable the check to be done anywhere within the string. If there's any match, fail the entire match.
It doesn't match with trillion because you added ^. means it must have a character before the match from beginning. For your first two cases it has h and f character. So if you change this into ^..(?!(.)(.)\2\1) then it will work for trillion.
So in general the regex will be:
(?!.*(.)(.)\2\1)
^^ any number of characters(other than \n)

Limiting RegEx to match only a string of 1-254 characters length

This is my RegEx:
"^[^\.]([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)([\.]{0,1})([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)[^\.]#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,6}|[0-9]{1,3})(\]?)$"
I need to match only strings less than 255 characters.
I've tried adding the word boundaries at the start of the RegEx but it fails:
"^(?=.{1,254})[^\.]([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)([\.]{0,1})([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)[^\.]#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,6}|[0-9]{1,3})(\]?)$"
You need the $ in the lookahead to make sure it's only up to 254. Otherwise, the lookahead will match even when there are more than 254.
(?=.{1,254}$)
Also, keep in mind that you can greatly simplify your regex because many characters that would usually need to be escaped do not need to when in a character class (square brackets).
"[\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]"
is the same as this:
"[-\w!#$%&'*+/=`{|}~?^]"
Note that the dash must be first in the character class to be a literal dash, and the caret must not be first.
With some other simplifications, here is the complete string:
"^(?=.{1,254}$)[-\w!#$%&'*+/=`{|}~?^]+(\.[-\w!#$%&'*+/=`{|}~?^]+)*#((\d{1,3}\.){3}\d{1,3}|([-\w]+\.)+[a-zA-Z]{2,6})$"
Notes:
I removed the stipulation that the first char shouldn't be a period ([^.]) because the next character class doesn't match a period anyway, so it's redundant.
I removed many extraneous parens
I replaced [0-9] with \d
I replaced {0,1} with the shorthand "?"
After the # sign, it seemed that you were trying to match an IP address or text domain name, so I separated them more so it couldn't be a combination
I'm not sure what the optional square bracket at the end was for, so I removed it: "(]?)"
I tried it in Regex Hero, and it works. See if it works for you.
This depends on what language you are working in. In Python for example you can regex to split a text into separate strings, and then use len() to remove strings longer than the 255 characters you want
I think this post will help. It shows how to limit certain patterns but I am not sure how you would add it to the entire regex.

Matching parts of string that contain no consecutive dashes

I need a regex that will match strings of letters that do not contain two consecutive dashes.
I came close with this regex that uses lookaround (I see no alternative):
([-a-z](?<!--))+
Which given the following as input:
qsdsdqf--sqdfqsdfazer--azerzaer-azerzear
Produces three matches:
qsdsdqf-
sqdfqsdfazer-
azerzaer-azerzear
What I want however is:
qsdsdqf-
-sqdfqsdfazer-
-azerzaer-azerzear
So my regex loses the first dash, which I don't want.
Who can give me a hint or a regex that can do this?
This should work:
-?([^-]-?)*
It makes sure that there is at least one non-dash character between every two dashes.
Looks to me like you do want to match strings that contain double hyphens, but you want to break them into substrings that don't. Have you considered splitting it between pairs of hyphens? In other words, split on:
(?<=-)(?=-)
As for your regex, I think this is what you were getting at:
(?:[^-]+|-(?<!--)|\G-)+
The -(?<!--) will match one hyphen, but if the next character is also a hyphen the match ends. Next time around, \G- picks up the second hyphen because it's the next character; the only way that can happen (except at the beginning of the string) is if a previous match broke off at that point.
Be aware that this regex is more flavor dependent than most; I tested it in Java, but not all flavors support \G and lookbehinds.