Need help with Regular Expression to Match Blood Group - regex

I'm trying to come up with a regex that helps me validate a Blood Group field - which should accept only A[+-], B[+-], AB[+-] and O[+-].
Here's the regex I came up with (and tested using Regex Tester):
[A|B|AB|O][\+|\-]
Now this pattern successfully matches A,B,O[+-] but fails against AB[+-].
Can anyone please suggest a regex that'll serve my purpose?
Thanks,
m^e

Try:
(A|B|AB|O)[+-]
Using square brackets defines a character class, which can only be a single character. The parentheses create a grouping which allows it to do what you want. You also don't need to escape the +- in the character class, as they don't have their regexy meaning inside of it.
As you mentioned in the comments, if it is a string you want to match against that has the exact values you are looking for, you might want to do this:
^(A|B|AB|O)[+-]$
Without the start of string and end of string anchors, things like "helloAB+asdads" would match.

The brackets [] denote a character class, meaning "any of the characters herein". You want the parentheses () for grouping:
(A|B|AB|0)(\+|-)

When you are building an alternation (e.g. (A|B|AB|O)), you should be careful with the ordering of the elements. Many regex engines will stop at the first alternate that matches (rather than the longest). If it weren't for the [-+] forcing a backtrack, (A|B|AB|O)[-+] would not work for "AB+". It is probably better to say (AB|A|B|O)[-+] (but you should check the docs for your regex engine).
Also, if you do not intend to capture the antigen for latter use, you should you use the non-capturing grouping parentheses: (?:AB|A|B|O)[-+].
Furthermore, if you want to ensure that the only thing in the string is a blood type then you need anchors to prevent it from matching only part of the string: ^(?:AB|A|B|O)[-+]$. A quick note on anchors, Depending on your regex engine, ^ may match the beginning of a line rather than the beginning of the string if you pass it a multiline-match option. Similarly, $ may match the end of a line rather than the end of a string. For this reason there are three other anchors in common (but not %100) usage: \A, \Z, and \z. If your regex engine supports them, \A always matches the start of the string, \Z matches the end of the string or a newline just before the end of the string, and \z matches only the send of the string.

For case insensitive within html pattern attribute you may try this
([AaBbOo]|[Aa][Bb])[\+-]
<input type="text" maxlength="3" pattern="([AaBbOo]|[Aa][Bb])[\+-]" required />

^(A|B|AB|O)[+-]?$
This will produce the correct out put.

Related

Regex Email validation with some special cases [duplicate]

I am trying to make a regex match which is discarding the lookahead completely.
\w+([-+.]\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*
This is the match and this is my regex101 test.
But when an email starts with - or _ or . it should not match it completely, not just remove the initial symbols. Any ideas are welcome, I've been searching for the past half an hour, but can't figure out how to drop the entire email when it starts with those symbols.
You can use the word boundary near # with a negative lookbehind to check if we are at the beginning of a string or right after a whitespace, then check if the 1st symbol is not inside the unwanted class [^\s\-_.]:
(?<=^|\s)[^\s\-_.]\w*(?:[-+.]\w+)*\b#\w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*
See demo
List of matches:
support#github.com
s.miller#mit.edu
j.hopking#york.ac.uk
steve.parker#soft.de
info#company-hotels.org
kiki#hotmail.co.uk
no-reply#github.com
s.peterson#mail.uu.net
info-bg#software-software.software.academy
Additional notes on usage and alternative notation
Note that it is best practice to use as few escaped chars as possible in the regex, so, the [^\s\-_.] can be written as [^\s_.-], with the hyphen at the end of the character class still denoting a literal hyphen, not a range. Also, if you plan to use the pattern in other regex engines, you might find difficulties with the alternation in the lookbehind, and then you can replace (?<=\s|^) with the equivalent (?<!\S). See this regex:
(?<!\S)[^\s_.-]\w*(?:[-+.]\w+)*\b#\w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*
And last but not least, if you need to use it in JavaScript or other languages not supporting lookarounds, replace the (?<!\S)/(?<=\s|^) with a (non)capturing group (\s|^), wrap the whole email pattern part with another set of capturing parentheses and use the language means to grab Group 1 contents:
(\s|^)([^\s_.-]\w*(?:[-+.]\w+)*\b#\w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*)
See the regex demo.
I use this for multiple email addresses, separate with ‘;':
([A-Za-z0-9._%-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4};)*
For a single mail:
[A-Za-z0-9._%-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}

How to end a string with $ directly after .* with a RegEx?

I'm trying to report on a set of URLs that catches all potential URL parameters and I'm having an issue defining the RegEx properly.
We have this RegEx to capture a few variations of our URLs to feed into our reporting but I need to be able to end the string with a $ but when I do, it doesn't show any results.
The RegEx:
/join/$|/join/\?product.*|/join/\.*
For another account, we only use one variation which is outlined below (which works):
^/join/$
I believe the issue is in that after \?product.*, I'm not ending the string (or even starting it).
So far I have tried: ^/join/$|(^[/join/\?product.*]$)|(^[/join/\.*]$) with no luck.
If you want to match the dollar sign literally you have to escape it \$ or else it would mean an anchor to assert the end of the string / line.
This pattern ^/join/$ would therefore only match /join/
In your pattern you use an alternation where the last part /join/\.* would match /join/ but also /join/..... because when you escape the dot you will match it literally and the * quantifier repeats 0+ times.
Perhaps you are looking for:
^/join/(?:\?product.*\$)?$
This will match /join/ followed by an optional part (?:\?product.*\$)? that will match ?product, followed by any char 0+ times and will end on $.
Regex demo
Please, make the pattern lazy and $ is a special character for regex so need to escape that. (Regarding escaping part, google analytics may follow something else.) [] is used to capture a character in a range, be careful with that as well, as you are trying to capture a group I think.
\?product.*?\$

Mixing Lookahead and Lookbehind in 1 Regexp

I'm trying to match first occurrence of window.location.replace("http://stackoverflow.com") in some HTML string.
Especially I want to capture the URL of the first window.location.replace entry in whole HTML string.
So for capturing URL I formulated this 2 rules:
it should be after this string: window.location.redirect("
it should be before this string ")
To achieve it I think I need to use lookbehind (for 1st rule) and lookahead (for 2nd rule).
I end up with this Regex:
.+(?<=window\.location\.redirect\(\"?=\"\))
It doesn't work. I'm not even sure that it legal to mix both rules like I did.
Can you please help me with translating my rules to Regex? Other ways of doing this (without lookahead(behind)) also appreciated.
The pattern you wrote is really not the one you need as it matches something very different from what you expect: text window.location.redirect("=") in text window.location.redirect("=") something. And it will only work in PCRE/Python if you remove the ? from before \" (as lookbehinds should be fixed-width in PCRE). It will work with ? in .NET regex.
If it is JS, you just cannot use a lookbehind as its regex engine does not support them.
Instead, use a capturing group around the unknown part you want to get:
/window\.location\.redirect\("([^"]*)"\)/
or
/window\.location\.redirect\("(.*?)"\)/
See the regex demo
No /g modifier will allow matching just one, first occurrence. Access the value you need inside Group 1.
The ([^"]*) captures 0+ characters other than a double quote (URLs you need should not have it). If these URLs you have contain a ", you should use the second approach as (.*?) will match any 0+ characters other than a newline up to the first ").

Match pattern anywhere in string?

I want to match the following pattern:
Exxxx49 (where x is a digit 0-9)
For example, E123449abcdefgh, abcdefE123449987654321 are both valid. I.e., I need to match the pattern anywhere in a string.
I am using:
^*E[0-9]{4}49*$
But it only matches E123449.
How can I allow any amount of characters in front or after the pattern?
Remove the ^ and $ to search anywhere in the string.
In your case the * are probably not what you intended; E[0-9]{4}49 should suffice. This will find an E, followed by four digits, followed by a 4 and a 9, anywhere in the string.
I would go for
^.*E[0-9]{4}49.*$
EDIT:
since it fullfills all requirements state by OP.
"[match] Exxxx49 (where x is digit 0-9)"
"allow for any amount of characters in front or after pattern"
It will match
^.* everything from, including the beginning of the line
E[0-9]{4}49 the requested pattern
.*$ everthing after the pattern, including the the end of the line
Your original regex had a regex pattern syntax error at the first *. Fix it and change it to this:
.*E\d{4}49.*
This pattern is for matching in engines (most engines) that are anchored, like Java. Since you forgot to specify a language.
.* matches any number of sequences. As it surrounds the match, this will match the entire string as long as this match is located in the string.
Here is a regex demo!
Just simply use this:
E[0-9]{4}49
How do I allow for any amount of characters in front or after pattern? but it only matches E123449
Use global flag /E\d{4}49/g if supported by the language
OR
Try with capturing groups (E\d{4}49)+ that is grouped by enclosing inside parenthesis (...)
Here is online demo

Why do I get successful but empty regex matches?

I'm searching the pattern (.*)\\1 on the text blabl with regexec(). I get successful but empty matches in regmatch_t structures. What exactly has been matched?
The regex .* can match successfully a string of zero characters, or the nothing that occurs between adjacent characters.
So your pattern is matching zero characters in the parens, and then matching zero characters immediately following that.
So if your regex was /f(.*)\1/ it would match the string "foo" between the 'f' and the first 'o'.
You might try using .+ instead of .*, as that matches one or more instead of zero or more. (Using .+ you should match the 'oo' in 'foo')
\1 is the backreference typically used for replacement later or when trying to further refine your regex by getting a match within a match. You should just use (.*), this will give you the results you want and will automatically be given the backreference number 1. I'm no regex expert but these are my thoughts based on my limited knowledge.
As an aside, I always revert back to RegexBuddy when trying to see what's really happening.
\1 is the "re-match" instruction. The question is, do you want to re-match immediately (e.g., BLABLA)
/(.+)\1/
or later (e.g., BLAahemBLA)
/(.+).*\1/