Regex not working in HTML5 pattern - regex

So I have this regex intended to let pass all text but those that contain as initial chars the "34" sequence:
^(?!34)(?=([\w]+))
The regex is working fine for me in https://regex101.com/r/iN1yN3/2 , check the tests to see the intended behavior.
Any Idea why it isn't working in my form?
<form>
<input pattern="^(?!34)(?=([\w]+))" type="text">
<button type="submit">Submit!</button>
</form>

The pattern attribute has to match the entire string. Assertions check for a match, but do not count towards the total match length. Changing the second assertion to \w+ will make the pattern match the entire string.
You can also skip the implied ^, leaving you with just:
<input pattern="(?!34)\w+" type="text">

Related

Pattern attribute value is not a valid regular expression

My HTML has the following input element (it is intended to accept email addresses that end in ".com"):
<input type="email" name="p_email_ad" id="p_email_ad" value="" required="required" pattern="[\-a-zA-Z0-9~!$%\^&*_=+}{\'?]+(\.[\-a-zA-Z0-9~!$%\^&*_=+}{\'?]+)*#([a-zA-Z0-9_][\-a-zA-Z0-9_]*(\.[\-a-zA-Z0-9_]+)*\.([cC][oO][mM]))(:[0-9]{1,5})?$" maxlength="64">
At some point in the past 2 months, Chrome has started returning the following JavaScript error (and preventing submission of the parent form) when validating that input:
Pattern attribute value
[\-a-zA-Z0-9~!$%\^&*_=+}{\'?]+(\.[\-a-zA-Z0-9~!$%\^&*_=+}{\'?]+)*#([a-zA-Z0-9_][\-a-zA-Z0-9_]*(\.[\-a-zA-Z0-9_]+)*\.([cC][oO][mM]))(:[0-9]{1,5})?$
is not a valid regular expression: Uncaught SyntaxError: Invalid
regular expression:
/[\-a-zA-Z0-9~!$%\^&*_=+}{\'?]+(\.[\-a-zA-Z0-9~!$%\^&*_=+}{\'?]+)*#([a-zA-Z0-9_][\-a-zA-Z0-9_]*(\.[\-a-zA-Z0-9_]+)*\.([cC][oO][mM]))(:[0-9]{1,5})?$/: Invalid escape
Regex101.com likes the regex pattern, but Chrome doesn't. What syntax do I have wrong?
Use
pattern="[-a-zA-Z0-9~!$%^&*_=+}{'?]+(\.[-a-zA-Z0-9~!$%^&*_=+}{'?]+)*#([a-zA-Z0-9_][-a-zA-Z0-9_]*(\.[-a-zA-Z0-9_]+)*\.([cC][oO][mM]))(:[0-9]{1,5})?"
The problem is that some chars that should not be escaped were escaped, like ' and ^ inside the character classes. Note that - inside a character class may be escaped, but does not have to when it is at its start.
Note also that HTML5 engines wraps the whole pattern inside ^(?: and )$ constructs, so there is no need using $ end of string anchor at the end of the pattern.
Test:
<form>
<input type="email" name="p_email_ad" id="p_email_ad" value="" required="required" pattern="[-a-zA-Z0-9~!$%^&*_=+}{'?]+(\.[-a-zA-Z0-9~!$%^&*_=+}{'?]+)*#([a-zA-Z0-9_][-a-zA-Z0-9_]*(\.[-a-zA-Z0-9_]+)*\.([cC][oO][mM]))(:[0-9]{1,5})?" maxlength="64">
<input type="Submit">
</form>
I was experiencing the same issue with my application but had a slightly different approach to a solution. My regex has the same issue that the accepted answer describes (special characters being escaped in character classes when they didn't need to be), however the regex I'm dealing with is coming from an external source so I could not modify it. This kind of regex is usually fine for most languages (passes validation in PHP) but as we have found out it breaks with HTML5.
My simple solution, url encode the regex before applying it to the input's pattern attribute. That seems to satisfy the HTML5 engine and it works as expected. JavaScript's encodeURIComponent is a good fit.

Regex match 3 characters followed by integers

I am new to regex expression and I need a regex in the following pattern:
The string must have a format of “TCK#”. TCK followed by integers.
For example, This is acceptable TCK123. This is not acceptable 123
Here is my current regex expression:
input class="form-control" required="true" type="text" name="TCKInput"
pattern="^[TCK][0-9]$">
With my current code, when the user enter TCK123, it is not acceptable, which is not what I am looking for
Change to below regex:
^(?:TCK)[0-9]+$
Demo: https://regex101.com/r/h9V7n1/1
Changes in the existing Regex you were using:
1) You were using [, ] around TCK which means regex has to match
any one of the values inside this bracket. As you have to match TCK
as it is, change it to (, )
2) You didn't mention + after [0-9] which means exactly one
occurrence will be matched. However, if you will mention +, it will
match one or more occurrence
If you want all 3 letters: TCK and then at least one or more digits after it, then try this:
^TCK\d+$
If you use [TCK] that will only accept one T, one C, or one K
Demo
This Demo sends to a live test server, so a successful submission of data will result in a response from said server
<form id='main' action='https://httpbin.org/post' method='post'>
<input class="form-control" required="true" type="text" name="TCKInput" pattern="^TCK\d+$">
<input type='submit'>
</form>

regular expression exclude match that contains a string pattern

I'm trying to narrow down my RegEx to ignore form elements with type="submit". I only want to select the portion of elements up to the part class="*" but still ignore if type="submit" comes before or after the class.
My regular expression thus far:
(<(?:input|select|textarea){1}.*[^type="submit"]class=")(((?!form\-control)[a-zA-Z0-9_ -])*")
Test case:
Line one should match up to the end of class, and line 2 ignored.
<input type="text" name="name" id="test" class="example-class" max-length="7" required="required">
<input type="submit" class="btn-primary" value="send">
Is this acheivable?
Thanks for your comments. The answer was a negative look ahead.
Adding (?!.*type="submit.*) to the start of the regex appears to have given me my desired result.
Working Regex:
(?!.*type="submit.*)(<(?:input|select|textarea).*class=")(((?!form\-control)[a-zA-Z0-9_ -])*")
(<(?:input|select|textarea)\s((?!type="submit")[\w\-]+\b="[^"]*"\s?)*>)
This expression is bound to the single tag.
It is better to avoid expressions like .* since it can go further and match a string which would begin inside one tag and end-up inside another.

How can I write a regular expression to validate my HTML input tag?

I want to specify a pattern for my <input> tag such that it matches any string having length 7, starting with ab- followed by 4 digits.
How would I write a regular expression for this?
The pattern you use in your comment is correct - just make sure to keep it within a <form> tag with a submit button.
<form>
<input type="text" pattern="[A|a][B|b][-][0-9]{4}">
<input type="submit">
</form>
A regex to match ab- followed by 4 digits would look like:
ab-\d{4}
Or for case insensitive matches:
[Aa][Bb]-\d{4}

Non-greedy regex acts greedily

Here's a simple example:
Text: <input name="zzz" value="18754" type="hidden"><input name="zzz" value="18311" type="hidden"><input name="zzz" value="17138" type="hidden">
Regex: /<input.*?value="(18754|17138)".*?>/
When matches are replaced by an empty string, the result is an empty string. I expected the middle <input> to remain since I am using non-greedy matching (.*?). Anyone could explain why it is removed?
There are two matches:
<input name="zzz" value="18754" type="hidden">
<input name="zzz" value="18311" type="hidden"><input name="zzz" value="17138" type="hidden">
In the second case, the first .*? matches name="zzz" value="18311" type="hidden"><input name="zzz". It's a match and it's non-greedy.
aix already explained, why it does match the middle part.
To avoid this behaviour, get rid of the .*?, instead try this:
/<input[^>]*value="(18754|17138)"[^>]*>/
See it here on Regexr
Instead of matching any character, match any, but ">"
aiz's answer is correct -- the second match includes the 2nd and 3rd input tags.
One possible fix for your regex would be to change . to [^>], like this:
/<input[^>]*?value="(18754|17138)"[^>]*?>/
That will cause it to match any character except >. But that has the obvious problem of breaking whenever > shows up inside a quoted literal. As everyone always says: Regexes aren't designed to work on HTML. Don't use them unless you have no other choice.