Need Regex Help - regex

Can anybody help me with a regex? I have a string with digits like:
X024123099XYAAXX99RR
I need a regex to check if a user has inserted the correct information. The rule should have also a fallback that the input is checked from left to right.
For example, when tested these inputs should return TRUE:
X024
X024123099X
X024123099XYA
X024123099XYAAXX99R
And these ones should return FALSE:
XX024
X02412AA99X
X024123099XYAAXX9911
And so on. The regex must check for the correct syntax, beginning from the left.
I have something like that, but this seems not to be correct:
\w\d{0,12}\w{0,6}\d{0,2}\w{0,2}
Big thanks for any help (I'm new to regex)

You could take OpenSauce's regex and then hack it to pieces to allow partial matches:
^[A-Z](\d{0,9}$|\d{9}([A-Z]{0,6}$|[A-Z]{6}(\d{0,2}$|\d{2}([A-Z]{0,2}$))))
It's not pretty but as far as I can tell it encodes your requirements.
Essentially I took each case of something like \d{9} and replaced it with something like (\d{0,9}$|\d{9}<rest of regex>).
I added ^ and $ because otherwise it will match substrings in an otherwise invalid string. For example, it will see an invalid string like XX024 and think it is okay because it contains X024.

If I understand you correctly, your strings should match the regex
[A-Z]\d{9}[A-Z]{6}\d{2}[A-Z]{2}
but you also want to check if a string could be a prefix of a matching string, is that correct? You might be able to express this in a single regex, but I can't think of a way to do so that's easy to read.
You haven't said which language you're using, but if your language gives you a way to tell if the end of the input string was reached while checking the regex, that would give you an easy way to get what you want. E.g. in java, the method Matcher.hitEnd tells you whether the end was reached, so the below code:
static Pattern pattern = Pattern.compile( "[A-Z]\\d{9}[A-Z]{6}\\d{2}[A-Z]{2}" );
static Matcher matcher = pattern.matcher( "" );
public static void main(String[] args) {
String[] strings = {
"X024",
"X024123099X",
"X024123099XYA",
"X024123099XYAAXX99R",
"XX024",
"X02412AA99X",
"X024123099XYAAXX9911"
};
for ( String string : strings ) {
out.format( "%s %s\n", string, inputOK(string) ? "OK" : "not OK" );
}
}
static boolean inputOK(String input) {
return matcher.reset(input).matches() || matcher.hitEnd();
}
gives output:
X024 OK
X024123099X OK
X024123099XYA OK
X024123099XYAAXX99R OK
XX024 not OK
X02412AA99X not OK
X024123099XYAAXX9911 not OK

Related

Why is this seemingly correct Regex not working correctly in Rascal?

In have following code:
set[str] noNnoE = { v | str v <- eu, (/\b[^eEnN]*\b/ := v) };
The goal is to filter out of a set of strings (called 'eu'), those strings that have no 'e' or 'n' in them (both upper- and lowercase). The regular expression I've provided:
/\b[^eEnN]?\b/
seems to work like it should, when I try it out in an online regex-tester.
When trying it out in the Rascel terminal it doesn't seem to work:
rascal>/\b[^eEnN]*\b/ := "Slander";
bool: true
I expected no match. What am I missing here? I'm using the latest (stable) Rascal release in Eclipse Oxygen1a.
Actually, the online regex-tester is giving the same match that we are giving. You can look at the match as follows:
if (/<w1:\b[^eEnN]?\b>/ := "Slander")
println("The match is: |<w1>|");
This is assigning the matched string to w1 and then printing it between the vertical bars, assuming the match succeeds (if it doesn't, it returns false, so the body of the if will not execute). If you do this, you will get back a match to the empty string:
The match is: ||
The online regex tester says the same thing:
Match 1
Full match 0-0 ''
If you want to prevent this, you can force at least one occurrence of the characters you are looking for by using a +, versus a ?:
rascal>/\b[^eEnN]+\b/ := "Slander";
bool: false
Note that you can also make the regex match case insensitive by following it with an i, like so:
/\b[^en]+\b/i
This may make it easier to write if you need to add more characters into the character class.
This solution (/\b[^en]+\b/i) doesn't work for strings consisting of two words, such as the Czech Republic.
Try /\b[^en]+\b$/i. That seems to work for me.

Perl: Help writing a regular expression

I am trying to write a common regular expression for the below 3 cases:
Supernatural_S07E23_720p_HDTV_X264-DIMENSION.mkv
the.listener.313.480p.hdtv.x264-2hd.mkv
How.I.met.your.mother.s02e07.hdtv.x264-xor.avi
Now my regular exoression should remove the series name from the original string i,e the output of above string will be:
S07E23_720p_HDTV_X264-DIMENSION.mkv
313.480p.hdtv.x264-2hd.mkv
s02e07.hdtv.x264-xor.avi
Now for the basic case of supernatural string I wrote the below regex and it worked fine but as soon as the series name got multiple words it fails.
$string =~ s/^(.*?)[\.\_\- ]//i; #delimiter can be (. - _ )
So, I have no idea how to proceed for the aboves cases I was thinking along the lines of \w+{1,6} but it also failed to do the required.
PS: Explanation of what the regular expression is doing will be appreciated.
you can detect if the .'s next token contains digit, if not, consider it as part of the name.
HOWEVER, I personally think there is no perfect solution for this. it'd still meet problem for something like:
24.313.480p.hdtv.x264-2hd.mkv // 24
Warehouse.13.s02e07.hdtv.x264-xor.avi // warehouse 13
As StanleyZ said, you'll always get into trouble with names containing numbers.
But, if you take these special cases appart, you can try :
#perl
$\=$/;
map {
if (/^([\w\.]+)[\.\_]([SE\d]+[\.\_].*)$/i) {
print "Match : Name='$1' Suffix='$2'";
} else {
print "Did not match $_";
}
}
qw!
Supernatural_S07E23_720p_HDTV_X264-DIMENSION.mkv
the.listener.313.480p.hdtv.x264-2hd.mkv
How.I.met.your.mother.s02e07.hdtv.x264-xor.avi
!;
which outputs :
Match : Name='Supernatural' Suffix='S07E23_720p_HDTV_X264-DIMENSION.mkv'
Match : Name='the.listener' Suffix='313.480p.hdtv.x264-2hd.mkv'
Match : Name='How.I.met.your.mother' Suffix='s02e07.hdtv.x264-xor.avi'
note : aren't you doing something illegal ? ;)

Flex 3 Regular Expression Problem

I've written a url validator for a project I am working on. For my requirements it works great, except when the last part for the url goes longer than 22 characters it breaks. My expression:
/((https?):\/\/)([^\s.]+.)+([^\s.]+)(:\d+\/\S+)/i
It expects input that looks like "http(s)://hostname:port/location".
When I give it the input:
https://demo10:443/111112222233333444445
it works, but if I pass the input
https://demo10:443/1111122222333334444455
it breaks. You can test it out easily at http://ryanswanson.com/regexp/#start. Oddly, I can't reproduce the problem with just the relevant (I would think) part /(:\d+\/\S+)/i. I can have as many characters after the required / and it works great. Any ideas or known bugs?
Edit:
Here is some code for a sample application that demonstrates the problem:
<mx:Application xmlns:mx="http://www.adobe.com/2006/mxml" layout="absolute">
<mx:Script>
<![CDATA[
private function click():void {
var value:String = input.text;
var matches:Array = value.match(/((https?):\/\/)([^\s.]+.)+([^\s.]+)(:\d+\/\S+)/i);
if(matches == null || matches.length < 1 || matches[0] != value) {
area.text = "No Match";
}
else {
area.text = "Match!!!";
}
}
]]>
</mx:Script>
<mx:TextInput x="10" y="10" id="input"/>
<mx:Button x="178" y="10" label="Button" click="click()"/>
<mx:TextArea x="10" y="40" width="233" height="101" id="area"/>
</mx:Application>
I debugged your regular expression on RegexBuddy and apparently it takes millions of steps to find a match. This usually means that something is terribly wrong with the regular expression.
Look at ([^\s.]+.)+([^\s.]+)(:\d+\/\S+).
1- It seems like you're trying to match subdomains too, but it doesn't work as intended since you didn't escape the dot. If you escape it, demo10:443/123 won't match because it'll need at least one dot. Change ([^\s.]+\.)+ to ([^\s.]+\.)* and it'll work.
2- [^\s.]+ is a bad character class, it will match the whole string and start backtracking from there. You can avoid this by using [^\s:.] which will stop at the colon.
This one should work as you want:
https?:\/\/([^\s:.]+\.)*([^\s:.]+):\d+\/\S+
This is a bug, either in Ryan's implementation or within Flex/Flash.
The regular expression syntax used above (less surrounding slashes and flags) matches Python which provides the following output:
# ignore case insensitive flag as it doesn't matter in this case
>>> import re
>>> rx = re.compile('((https?):\/\/)([^\s.]+.)+([^\s.]+)(:\d+\/\S+)')
>>> print rx.match('https://demo10:443/1111122222333334444455').groups()
('https://', 'https', 'demo1', '0', ':443/1111122222333334444455')

Regex - If contains '%', can only contain '%20'

I am wanting to create a regular expression for the following scenario:
If a string contains the percentage character (%) then it can only contain the following: %20, and cannot be preceded by another '%'.
So if there was for instance, %25 it would be rejected. For instance, the following string would be valid:
http://www.test.com/?&Name=My%20Name%20Is%20Vader
But these would fail:
http://www.test.com/?&Name=My%20Name%20Is%20VadersAccountant%25
%%%25
Any help would be greatly appreciated,
Kyle
EDIT:
The scenario in a nutshell is that a link is written to an encoded state and then launched via JavaScript. No decoding works. I tried .net decoding and JS decoding, each having the same result - The results stay encoded when executed.
Doesn't require a %:
/^[^%]*(%20[^%]*)*$/
Which language are you using?
Most languages have a Uri Encoder / Decoder function or class.
I would suggest you decode the string first and than check for valid (or invalid) characters.
i.e. something like /[\w ]/ (empty is a space)
With a regex in the first place you need to respect that www.example.com/index.html?user=admin&pass=%%250 means that the pass really is "%250".
Another solution if look-arounds are not available:
^([^%]|%([013-9a-fA-F][0-9a-fA-F]|2[1-9a-fA-F]))*$
Reject the string if it matches %[^2][^0]
I think that would find what you need
/^([^%]|%%|%20)+$/
Edit: Added case where %% is valid string inside URI
Edit2: And fixed it for case where it should fail :-)
Edit3:
In case you need to use it in editor (which would explain why you can't use more programmatic way), then you have to correctly escape all special characters, for example in Vim that regex should lool:
/^\([^%]\|%%\|%20\)\+$/
Maybe a better approach is to deal with that validation after you decode that string:
string name = HttpUtility.UrlDecode(Request.QueryString["Name"]);
/^([^%]|%20)*$/
This requires a test against the "bad" patterns. If we're allowing %20 - we don't need to make sure it exists.
As others have said before, %% is valid too... and %%25would be %25
The below regex matches anything that doesn't fit into the above rules
/(?<![^%]%)%(?!(20|%))/
The first brackets check whether there is a % before the character (meaning that it's %%) and also checks that it's not %%%. it then checks for a %, and checks whether the item after doesn't match 20
This means that if anything is identified by the regex, then you should probably reject it.
I agree with dominic's comment on the question. Don't use Regex.
If you want to avoid scanning the string twice, you can just iteratively search for % and then check that it is being followed by 20 and nothing else. (Update: allow a % after to be interpreted as a literal %nnn sequence)
// pseudo code
pos = 0
while (pos = mystring.find(pos, '%'))
{
if mystring[pos+1] = "%" then
pos = pos + 2 // ok, this is a literal, skip ahead
else if mystring.substring(pos,2) != "20"
return false; // string is invalid
end if
}
return true;

Iterative regex matching

We've become fairly adept at generating various regular expressions to match input strings, but we've been asked to try to validate these strings iteratively. Is there an easy way to iteratively match the input string against a regular expression?
Take, for instance, the following regular expression:
[EW]\d{1,3}\.\d
When the user enters "E123.4", the regular expression is met. How do I validate the user's input while they type it? Can I partially match the string "E1" against the regular expression?
Is there some way to say that the input string only partially matched the input? Or is there a way to generate sub-expressions out of the master expression automatically based on string length?
I'm trying to create a generic function that can take any regular expression and throw an exception as soon as the user enters something that cannot meet the expression. Our expressions are rather simple in the grand scheme of things, and we are certainly not trying to parse HTML :)
Thanks in advance.
David
You could do it only by making every part of the regex optional, and repeating yourself:
^([EW]|[EW]\d{1,3}|[EW]\d{1,3}\.|[EW]\d{1,3}\.\d)$
This might work for simple expressions, but for complex ones this is hardly feasible.
Hard to say... If the user types an "E", that matches the begining but not the rest. Of course, you don't know if they will continue to type "123.4" or if they will just hit "Enter" (I assume you use "Enter" to indicate the end of input) right away. You could use groups to test that all 3 groups match, such as:
([EW])(\d{1,3})(\.\d)
After the first character, try to match the first group. After the next few inputs, match the first AND second group, and when they enter the '.' and last digit you have to find a match for all 3 groups.
You could use partial matches if your regex lib supports it (as does Boost.Regex).
Adapting the is_possible_card_number example on this page to the example in your question:
#include <boost/regex.hpp>
// Return false for partial match, true for full match, or throw for
// impossible match
bool
CheckPartialMatch(const std::string& Input, const boost::regex& Regex)
{
boost::match_results<std::string::const_iterator> what;
if(0 == boost::regex_match(Input, what, Regex, boost::match_default | boost::match_partial))
{
// the input so far could not possibly be valid so reject it:
throw std::runtime_error(
"Invalid data entered - this could not possibly be a match");
}
// OK so far so good, but have we finished?
if(what[0].matched)
{
// excellent, we have a result:
return true;
}
// what we have so far is only a partial match...
return false;
}
int main()
{
const boost::regex r("[EW]\\d{1,3}\\.\\d");
// The input is incomplete, so we expect a "false" result
assert(!CheckPartialMatch("E1", r));
// The input completely satisfies the expression, so expect a "true" result
assert(CheckPartialMatch("E123.4", r));
try{
// Input can't match the expression, so expect an exception.
CheckPartialMatch("EX3", r);
assert(false);
}
catch(const std::runtime_error&){
}
return 0;
}