How can I make this regex shorter

How can I make this regex shorter - regex

Let's say I have a line of text like this
アイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワヲンヴガギグゲゴザジズゼゾダヂヅデドバビブベボパピプペポァィゥェォャュョッアイウエオカキクケコサシスセソタチツテトナアイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワヲンヴガギグゲゴザジズゼゾダヂヅデドバビブベボパピプペポァィゥェォャュョッアイウエオカキクケコサシスセソタチツテトナアイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワヲンヴガギグ
I want to verify input is katakana or not so I use regex
'/^[゠ ァ ア ィ イ ゥ ウ ェ エ ォ オ カ ガ キ ギ ク グ ケ ゲ コ ゴ サ ザ シ ジ ス ズ セ ゼ ソ ゾ タ ダ チ ヂ ッ ツ ヅ テ デ ト ド ナ ニ ヌ ネ ノ ハ バ パ ヒ ビ ピ フ ブ プ ヘ ベ ペ ホ ボ ポ マ ミ ム メ モ ャ ヤ ュ ユ ョ ヨ ラ リ ル レ ロ ヮ ワ ヰ ヱ ヲ ン ヴ ヵ ヶ ヷ ヸ ヹ ヺ ・ ー ヽ ヾ ヿ｟ ｠ ｡ ｢ ｣ ､ ･ ｦ ｧ ｨ ｩ ｪ ｫ ｬ ｭ ｮ ｯ ｰ ｱ ｲ ｳ ｴ ｵ ｶ ｷ ｸ ｹ ｺ ｻ ｼ ｽ ｾ ｿ ﾀ ﾁ ﾂ ﾃ ﾄ ﾅ ﾆ ﾇ ﾈ ﾉ ﾊ ﾋ ﾌ ﾍ ﾎ ﾏ ﾐ ﾑ ﾒ ﾓ ﾔ ﾕ ﾖ ﾗ ﾘ ﾙ ﾚ ﾛ ﾜ ﾝ ﾞ]+$/'
Is there some way to compact that?
I know its hard code, before that I used ^[ァ-ヴーｧ-ﾝﾞﾟ]+$ but it not work in laravel request rule.

Your regex ァ-ヴーｧ-ﾝﾞﾟ is correct, you just need to add /u to make it work.
so the correct regex code is
/^[ァ-ヴーｧ-ﾝﾞﾟ]+$/u
or an example in the laravel validation :
'name' => 'required|regex:/^[ァ-ヴーｧ-ﾝﾞﾟ]+$/u',
The /u modifier is for unicode support
You can also use Unicode octal as regex range, an example for Katakana is ([\u30a0-\u30ff]*), but in php pcre \u should be changed to \x like:
'name' => 'required|regex:/^[\x{30a0}-\x{30ff} ]+$/u',
Also, you can check this gist for other katakana and hiragana regex. Example:
Regex for matching full-width Katakana (zenkaku 全角)
([ァ-ン])
Regex for matching half-width Katakana (hankaku 半角)
([ｧ-ﾝﾞﾟ])

Related

Match sematic version in string by regex

The definition of a valid "semantic version" can be found here.
I've got the official regex from semserver, but I want to match whole version without groups and text which is inside other words. For example from "Adobe Flash Player 3.20.1 version" I want to match only 3.20.1.
How should I modify this regex to get this information?
^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$
Here is the link for regex101 test https://regex101.com/r/YDXnbS/1
Here's the list of examples that should match, noting the first example should match only the version 3.20.1:
Valid Semantic Versions
Adobe Flash Player 3.20.1 version
0.0.4
1.2.3
10.20.30
1.1.2-prerelease+meta
1.1.2+meta
1.1.2+meta-valid
1.0.0-alpha
1.0.0-beta
1.0.0-alpha.beta
1.0.0-alpha.beta.1
1.0.0-alpha.1
1.0.0-alpha0.valid
1.0.0-alpha.0valid
1.0.0-alpha-a.b-c-somethinglong+build.1-aef.1-its-okay
1.0.0-rc.1+build.1
2.0.0-rc.1+build.123
1.2.3-beta
10.2.3-DEV-SNAPSHOT
1.2.3-SNAPSHOT-123
1.0.0
2.0.0
1.1.7
2.0.0+build.1848
2.0.1-alpha.1227
1.0.0-alpha+beta
1.2.3----RC-SNAPSHOT.12.9.1--.12+788
1.2.3----R-S.12.9.1--.12+meta
1.2.3----RC-SNAPSHOT.12.9.1--.12
1.0.0+0.build.1-rc.10000aaa-kk-0.1
99999999999999999999999.999999999999999999.99999999999999999
1.0.0-0A.is.legal
Invalid Semantic Versions
1
1.2
1.2.3-0123
1.2.3-0123.0123
1.1.2+.123
+invalid
-invalid
-invalid+invalid
-invalid.01
alpha
alpha.beta
alpha.beta.1
alpha.1
alpha+beta
alpha_beta
alpha.
alpha..
beta
1.0.0-alpha_beta
-alpha.
1.0.0-alpha..
1.0.0-alpha..1
1.0.0-alpha...1
1.0.0-alpha....1
1.0.0-alpha.....1
1.0.0-alpha......1
1.0.0-alpha.......1
01.1.1
1.01.1
1.1.01
1.2
1.2.3.DEV
1.2-SNAPSHOT
1.2.31.2.3----RC-SNAPSHOT.12.09.1--..12+788
1.2-RC-SNAPSHOT
-1.0.3-gamma+b7718
+justmeta
9.8.7+meta+meta
9.8.7-whatever+meta+meta
99999999999999999999999.999999999999999999.99999999999999999----RC-SNAPSHOT.12.09.1--------------------------------..12

I modified a regex found at iHateRegex by allowing the start to be start or a preceding space (?<=^| ), and the end to be end or a following space (?=$| ), which works for all your examples:
(?<=^| )(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?(?=$| )
See live demo.

Enforcing strong passwords in Kohana Auth

I am trying to enforce strong(er) passwords in my Kohana application using Auth, by using the following regex to require at least one upper case letter, one lower case, one number, one non-alphanum (special character), and a minimum of 8 characters.
^(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])(?=.*[^A-Za-z0-9]).{8,}$
The regex is working, as can be seen on Rubular. Here's the code I'm using in Kohana's Model_Auth_User, which extends ORM.
public function rules() {
return array(
'password' => array(
array('not_empty'),
array('min_length', array(':value', 8)),
array('regex', array(':value', '/^(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])(?=.*[^A-Za-z0-9]).{8,}$/'))
)
);
}
However, when creating a new user account, or changing the password of an existing one, this regex seems to be completely ignored. The min_length from the line above is working fine though!
It will stop me from using test as a password because it's less than 8 characters, but testing123 doesn't give any sort of error message.
Any ideas why this is happening and a way around it?

Figured it out - you have to add the regex to the get_password_validation function (in the same Model) or it doesn't output any error message.
public static function get_password_validation($values) {
return Validation::factory($values)
->rule('password', 'min_length', array(':value', 8))
->rule('password', 'regex', array(':value', '/^(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])(?=.*[^A-Za-z0-9]).{8,}$/'))
->rule('password_confirm', 'matches', array(':validation', ':field', 'password'));
}
If added, the regex in the rules() function needs to be removed or it's not possible to login as it runs the regex check on the hashed string, which doesn't contain any special characters.
Hope this helps someone.

Regular Expression to catch email encoded attachments string

I would like to write a regular expression that catch Encoded Words parts from email MIME message string (eml). for example, this is part of email:
<div dir=3D"ltr"><br clear=3D"all"><div><div dir=3D"ltr"><div style=3D"dire=
ction:rtl">-------------</div><div style=3D"direction:rtl">=D7=91=D7=91=D7=
=A8=D7=9B=D7=94,</div><div style=3D"direction:rtl">=D7=90=D7=91=D7=99=D7=A2=
=D7=93 =D7=9B=D7=94=D7=9F</div></div></div>
</div>
--20cf3003bc2e044e980500f755dc--
--20cf3003bc2e044e9d0500f755de
Content-Type: text/plain; charset=US-ASCII; name="EhudBanay.txt"
Content-Disposition: attachment; filename="EhudBanay.txt"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_hz0z4us30
aHR0cHM6Ly93d3cuZmFjZWJvb2suY29tL2wucGhwP3U9aHR0cHMlM0ElMkYlMkZ3d3cucmFwaWRz
aGFyZS5jb20lMkZmaWxlcyUyRjM4NzAxNTA2MDclMkZFaHVkX0JhbmFpXy1fVGlwX1RpcGFfXzE5
OThfLnJhciZoPTdBUUZRb0RMQQ0KDQpodHRwczovL3d3dy5mYWNlYm9vay5jb20vbC5waHA/dT1o
dHRwcyUzQSUyRiUyRnd3dy5yYXBpZHNoYXJlLmNvbSUyRmZpbGVzJTJGMzk5MzMyNjg1MSUyRkVo
dWRfQmFuYWlfLV9UYWhhdF9TaWFoX0hhWWFzbWluXzE5ODkucmFyJmg9QkFRRWhJY3djDQoNCmh0
dHBzOi8vd3d3LmZhY2Vib29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hh
cmUuY29tJTJGZmlsZXMlMkYzMjQwMTM5MTMyJTJGRWh1ZF9CYW5haV8tX1Jlc2lzZXlfTGFpbGFf
MjAxMS5yYXImaD1RQVFHN0pGWXUNCg0KaHR0cHM6Ly93d3cuZmFjZWJvb2suY29tL2wucGhwP3U9
aHR0cHMlM0ElMkYlMkZ3d3cucmFwaWRzaGFyZS5jb20lMkZmaWxlcyUyRjE5NTE2ODA4MTglMkZF
aHVkX0JhbmFpXy1fT2RfTWVhdF9fMTk5Nl8ucmFyJmg9YUFRRUVuaUIxDQoNCmh0dHBzOi8vd3d3
LmZhY2Vib29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hhcmUuY29tJTJG
ZmlsZXMlMkYyMjc2NTc5MTgzJTJGRWh1ZF9CYW5haV8tX0thcm92X18xOTg5Xy5yYXImaD1mQVFH
a2dYVXENCg0KaHR0cHM6Ly93d3cuZmFjZWJvb2suY29tL2wucGhwP3U9aHR0cHMlM0ElMkYlMkZ3
d3cucmFwaWRzaGFyZS5jb20lMkZmaWxlcyUyRjQwOTg0NjQzNjYlMkZFaHVkX0JhbmFpXy1fSGFT
aGxpc2hpX18xOTkyXy5yYXImaD1GQVFGNjRmY3gNCg0KaHR0cHM6Ly93d3cuZmFjZWJvb2suY29t
L2wucGhwP3U9aHR0cHMlM0ElMkYlMkZ3d3cucmFwaWRzaGFyZS5jb20lMkZmaWxlcyUyRjMxNDY1
NDc2OTElMkZFaHVkX0JhbmFpXy1fRWh1ZF9CYW5haV9WZUhhUGxpdGltX18xOTg3X19GLnBhcnQy
LnJhciZoPUJBUUVoSWN3Yw0KDQpodHRwczovL3d3dy5mYWNlYm9vay5jb20vbC5waHA/dT1odHRw
cyUzQSUyRiUyRnd3dy5yYXBpZHNoYXJlLmNvbSUyRmZpbGVzJTJGMjYwNDg2Njc1MiUyRkVodWRf
QmFuYWlfLV9FaHVkX0JhbmFpX1ZlSGFQbGl0aW1fXzE5ODdfX0YucGFydDEucmFyJmg9REFRSHpG
LXZBDQoNCmh0dHBzOi8vd3d3LmZhY2Vib29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3
LnJhcGlkc2hhcmUuY29tJTJGZmlsZXMlMkYyNjQxMzIwNzg2JTJGRWh1ZF9CYW5haV8tX0Ryb3Bz
X09mX1RoZV9OaWdodF9fMjAxMV8ucmFyJmg9Y0FRRlRZQ1pTDQoNCmh0dHBzOi8vd3d3LmZhY2Vi
b29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hhcmUuY29tJTJGZmlsZXMl
MkYzMTQ3NzUzNzAwJTJGRWh1ZCUyNTIwQmFuYWklMjUyMC0lMjUyMEtlZXAlMjUyMERyaXZpbmcu
cGFydDEucmFyJmg9S0FRRWtPUkZTDQoNCmh0dHBzOi8vd3d3LmZhY2Vib29rLmNvbS9sLnBocD91
PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hhcmUuY29tJTJGZmlsZXMlMkYxNzc1NDI5NDY3JTJG
RWh1ZF9CYW5haV8tX0FuZV9MaV9fMjAwNF8ucmFyJmg9dkFRRWlEWXFu
--20cf3003bc2e044e9d0500f755de--
i would like to catch only this part:
aHR0cHM6Ly93d3cuZmFjZWJvb2suY29tL2wucGhwP3U9aHR0cHMlM0ElMkYlMkZ3d3cucmFwaWRz
aGFyZS5jb20lMkZmaWxlcyUyRjM4NzAxNTA2MDclMkZFaHVkX0JhbmFpXy1fVGlwX1RpcGFfXzE5
OThfLnJhciZoPTdBUUZRb0RMQQ0KDQpodHRwczovL3d3dy5mYWNlYm9vay5jb20vbC5waHA/dT1o
dHRwcyUzQSUyRiUyRnd3dy5yYXBpZHNoYXJlLmNvbSUyRmZpbGVzJTJGMzk5MzMyNjg1MSUyRkVo
dWRfQmFuYWlfLV9UYWhhdF9TaWFoX0hhWWFzbWluXzE5ODkucmFyJmg9QkFRRWhJY3djDQoNCmh0
dHBzOi8vd3d3LmZhY2Vib29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hh
cmUuY29tJTJGZmlsZXMlMkYzMjQwMTM5MTMyJTJGRWh1ZF9CYW5haV8tX1Jlc2lzZXlfTGFpbGFf
MjAxMS5yYXImaD1RQVFHN0pGWXUNCg0KaHR0cHM6Ly93d3cuZmFjZWJvb2suY29tL2wucGhwP3U9
aHR0cHMlM0ElMkYlMkZ3d3cucmFwaWRzaGFyZS5jb20lMkZmaWxlcyUyRjE5NTE2ODA4MTglMkZF
aHVkX0JhbmFpXy1fT2RfTWVhdF9fMTk5Nl8ucmFyJmg9YUFRRUVuaUIxDQoNCmh0dHBzOi8vd3d3
LmZhY2Vib29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hhcmUuY29tJTJG
ZmlsZXMlMkYyMjc2NTc5MTgzJTJGRWh1ZF9CYW5haV8tX0thcm92X18xOTg5Xy5yYXImaD1mQVFH
a2dYVXENCg0KaHR0cHM6Ly93d3cuZmFjZWJvb2suY29tL2wucGhwP3U9aHR0cHMlM0ElMkYlMkZ3
d3cucmFwaWRzaGFyZS5jb20lMkZmaWxlcyUyRjQwOTg0NjQzNjYlMkZFaHVkX0JhbmFpXy1fSGFT
aGxpc2hpX18xOTkyXy5yYXImaD1GQVFGNjRmY3gNCg0KaHR0cHM6Ly93d3cuZmFjZWJvb2suY29t
L2wucGhwP3U9aHR0cHMlM0ElMkYlMkZ3d3cucmFwaWRzaGFyZS5jb20lMkZmaWxlcyUyRjMxNDY1
NDc2OTElMkZFaHVkX0JhbmFpXy1fRWh1ZF9CYW5haV9WZUhhUGxpdGltX18xOTg3X19GLnBhcnQy
LnJhciZoPUJBUUVoSWN3Yw0KDQpodHRwczovL3d3dy5mYWNlYm9vay5jb20vbC5waHA/dT1odHRw
cyUzQSUyRiUyRnd3dy5yYXBpZHNoYXJlLmNvbSUyRmZpbGVzJTJGMjYwNDg2Njc1MiUyRkVodWRf
QmFuYWlfLV9FaHVkX0JhbmFpX1ZlSGFQbGl0aW1fXzE5ODdfX0YucGFydDEucmFyJmg9REFRSHpG
LXZBDQoNCmh0dHBzOi8vd3d3LmZhY2Vib29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3
LnJhcGlkc2hhcmUuY29tJTJGZmlsZXMlMkYyNjQxMzIwNzg2JTJGRWh1ZF9CYW5haV8tX0Ryb3Bz
X09mX1RoZV9OaWdodF9fMjAxMV8ucmFyJmg9Y0FRRlRZQ1pTDQoNCmh0dHBzOi8vd3d3LmZhY2Vi
b29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hhcmUuY29tJTJGZmlsZXMl
MkYzMTQ3NzUzNzAwJTJGRWh1ZCUyNTIwQmFuYWklMjUyMC0lMjUyMEtlZXAlMjUyMERyaXZpbmcu
cGFydDEucmFyJmg9S0FRRWtPUkZTDQoNCmh0dHBzOi8vd3d3LmZhY2Vib29rLmNvbS9sLnBocD91
PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hhcmUuY29tJTJGZmlsZXMlMkYxNzc1NDI5NDY3JTJG
RWh1ZF9CYW5haV8tX0FuZV9MaV9fMjAwNF8ucmFyJmg9dkFRRWlEWXFu
I didn't manage to do it with regular expression so i tried to use the following line which always start with "--" and ends with "--". notice that before the wanted part there is always an empty line.
I tried: "(\s).*(--)$" but it returns only the following line.
Can someone please help?

It sounds like you're trying to parse multipart/mixed email. There are libraries in most languages to do this already. If you want to write your own I'd suggest following the structure of a multipart message.
Find the boundary defined in the content-type header
Split the message into parts delimited by the boundary (prefixed by --).
For each part seek to the first instance of two consecutive line breaks that denote the end of the headers.
While regular expressions might be helpful for some part of this. I'm not sure its the right tool to parse a structured message.

You can use this regex:
\n\s*\n\K(?:[^-]{2})*.?(?=\n--)
Online demo

Pattern doesn't remove special characters which are by themselves on a website

So i am currently getting a user input in the form of a URL and parsing it and then printing the other pages that website links to. The package that i am using is:
LWP::Simple
I fetch the link using user input from command line and store it in a variable. I get it using the $ARGV[0].
Then i proceed to make another variable and use the $get on the variable where i store the website.
Then i proceeded to make an array variable and apply the regex on the variable
/\shref="?([^\s>"]+)/gi;
which stored the results of the get function being used on the variable containing the website string. And then i did a foreach loop on the array to print out the results.
However, while it does print links and stuff, it also end up printing just standalone special characters such as / and # if there is nothing after them.
So like if there is something like /blabalbla it prints that. but if there are just standalone special characters such as /, \, or #, it also prints them. Any way i can modify the regex so that if the special characters don't follow a string, they should not print. New at learning perl and not so talented at regex

I can't help you with your specific problem without further information, but in the mean time I suggest that you look at HTML::LinkExtor which was written for this purpose.
Here's an example code its output. It lists only <a> elements that have an href attribute.
use strict;
use warnings;
use 5.010;
use LWP;
use HTML::LinkExtor;
my $ua = LWP::UserAgent->new;
my $resp = $ua->get('http://www.bbc.co.uk/');
my $extor = HTML::LinkExtor->new(undef, $resp->base);
$extor->parse($resp->decoded_content);
for my $link ($extor->links) {
my ($tag, %attr) = #$link;
next unless $tag eq 'a' and $attr{href};
say $attr{href};
}
output
http://m.bbc.co.uk
http://www.bbc.co.uk/
http://www.bbc.co.uk/#h4discoveryzone
http://www.bbc.co.uk/accessibility/
https://ssl.bbc.co.uk/id/status
http://www.bbc.co.uk/news/
http://www.bbc.com/news/
http://www.bbc.co.uk/sport/
http://www.bbc.co.uk/weather/
http://shop.bbc.com/
http://www.bbc.com/earth/
http://www.bbc.com/travel/
http://www.bbc.com/capital/
http://www.bbc.co.uk/iplayer/
http://www.bbc.com/culture/
http://www.bbc.com/autos/
http://www.bbc.com/future/
http://www.bbc.co.uk/tv/
http://www.bbc.co.uk/radio/
http://www.bbc.co.uk/cbbc/
http://www.bbc.co.uk/cbeebies/
http://www.bbc.co.uk/arts/
http://www.bbc.co.uk/ww1/
http://www.bbc.co.uk/food/
http://www.bbc.co.uk/history/
http://www.bbc.co.uk/learning/
http://www.bbc.co.uk/music/
http://www.bbc.co.uk/science/
http://www.bbc.co.uk/nature/
http://www.bbc.com/earth/
http://www.bbc.co.uk/local/
http://www.bbc.co.uk/travel/
http://www.bbc.co.uk/a-z/
http://www.bbc.co.uk/#orb-footer
http://search.bbc.co.uk/search
http://www.bbc.co.uk/privacy/cookies/managing/cookie-settings.html
http://www.bbc.co.uk/locator/default/desktop/en-GB?ptrt=%2F
http://www.bbc.co.uk/#
http://www.bbc.co.uk/#
http://www.bbc.co.uk/weather/2643743?day=0
http://www.bbc.co.uk/weather/2643743?day=0
http://www.bbc.co.uk/weather/2643743?day=1
http://www.bbc.co.uk/weather/2643743?day=1
http://www.bbc.co.uk/weather/2643743?day=2
http://www.bbc.co.uk/weather/2643743?day=2
http://www.bbc.co.uk/locator/default/desktop/en-GB?ptrt=%2F
http://www.bbc.co.uk/weather/2643743
http://www.bbc.co.uk/news/science-environment-30311816
http://www.bbc.co.uk/news/science-environment-30311822
http://www.bbc.co.uk/news/science-environment-30311818
http://www.bbc.co.uk/news/magazine-30282261
http://www.bbc.co.uk/news/science-environment-30311816
http://www.bbc.co.uk/news/uk-politics-30291460
http://www.bbc.co.uk/news/
http://www.bbc.co.uk/news/uk-england-kent-30319549
http://www.bbc.co.uk/news/world-europe-30306106
http://www.bbc.co.uk/news/world-europe-30306992
http://www.bbc.co.uk/news/uk-30306145
http://www.bbc.co.uk/news/local/
http://www.bbc.co.uk/news/england/london/
http://www.bbc.co.uk/news/uk-england-london-30308694
http://www.bbc.co.uk/news/uk-england-london-30315650
http://www.bbc.co.uk/news/uk-england-london-30321504
http://www.bbc.co.uk/sport/live/football/29959148
http://www.bbc.co.uk/sport/0/
http://www.bbc.co.uk/sport/live/snooker/29618359
http://www.bbc.co.uk/sport/football/30204433
http://www.bbc.co.uk/sport/cricket/30308980
http://www.bbc.co.uk/sport/football/30204434
http://www.bbc.co.uk/sport/0/football/
http://www.bbc.co.uk/sport/football/30204459
http://www.bbc.co.uk/sport/football/30204511
http://www.bbc.co.uk/sport/football/28647040
http://www.bbc.co.uk/?dzf=sport
http://www.bbc.co.uk/?dzf=entertainment
http://www.bbc.co.uk/?dzf=bbcnow
http://www.bbc.co.uk/?dzf=entertainment
http://www.bbc.co.uk/?dzf=news
http://www.bbc.co.uk/?dzf=lifestyle
http://www.bbc.co.uk/?dzf=knowledge
http://www.bbc.co.uk/?dzf=sport
http://www.bbc.co.uk/news/
http://www.bbc.com/news/
http://www.bbc.co.uk/sport/
http://www.bbc.co.uk/weather/
http://shop.bbc.com/
http://www.bbc.com/earth/
http://www.bbc.com/travel/
http://www.bbc.com/capital/
http://www.bbc.co.uk/iplayer/
http://www.bbc.com/culture/
http://www.bbc.com/autos/
http://www.bbc.com/future/
http://www.bbc.co.uk/tv/
http://www.bbc.co.uk/radio/
http://www.bbc.co.uk/cbbc/
http://www.bbc.co.uk/cbeebies/
http://www.bbc.co.uk/arts/
http://www.bbc.co.uk/ww1/
http://www.bbc.co.uk/food/
http://www.bbc.co.uk/history/
http://www.bbc.co.uk/learning/
http://www.bbc.co.uk/music/
http://www.bbc.co.uk/science/
http://www.bbc.co.uk/nature/
http://www.bbc.com/earth/
http://www.bbc.co.uk/local/
http://www.bbc.co.uk/travel/
http://www.bbc.co.uk/a-z/
http://www.bbc.co.uk/
http://www.bbc.co.uk/terms/
http://www.bbc.co.uk/aboutthebbc/
http://www.bbc.co.uk/privacy/
http://www.bbc.co.uk/privacy/cookies/about
http://www.bbc.co.uk/accessibility/
http://www.bbc.co.uk/guidance/
http://www.bbc.co.uk/contact/
http://www.bbc.co.uk/bbctrust/
http://www.bbc.co.uk/complaints/
http://www.bbc.co.uk/help/web/links/

symfony form validation clean with regex before validate with regex

I'm using Symfony 1.4 and am a little stuck regarding form validation. I have a validator like the one below:
$this->setValidator('mobile_number', new sfValidatorAnd(array(
new sfValidatorString(array('max_length' => 13)),
new sfValidatorRegex(array('pattern' => '/^07\d{9}$/'),
array('invalid' => 'Invalid mobile number.')),
)
));
That is a simple regex for matching a UK mobile phone number.
However my problem is that if someone submitted a string like this: "07 90 44 65 48 1" the regex would fail but they have given a valid number if a the string was cleaned to remove whitespace first.
My problem is that I don't know where within the symfony form framework I would accomplish this.
I need to strip everything but numbers from the user input and then use my mobile_number validator.
Any ideas would be greatly appreciated. Thanks.

You may be able to do this with a combination of standard validators, but it might well be easiest to construct your own custom validator. There is a guide to this on the symfony website: http://www.symfony-project.org/more-with-symfony/1_4/en/05-Custom-Widgets-and-Validators#chapter_05_building_a_simple_widget_and_validator
I think it should probably look something like this:
class sfValidatorMobilePhone extends sfValidatorBase
{
protected function doClean($value)
{
$value = preg_replace('/\s/','',$value);
if (
(0 !== strpos($value, '07')) ||
(13 < strlen($value)) ||
(0 !== preg_match('/[^\d]/', $value))
)
{
throw new sfValidatorError($this, 'invalid', array('value' => $value));
}
else
{
return $value;
}
}
}
Save this as lib/validator/sfValidatorMobilePhone.class.php. You could then call it as
$this->setValidator('mobile_number', new sfValidatorMobilePhone());

I don't know Symfony, so I don't know how you would go about cleaning the input. If you can do a regex-based search-and-replace somehow, you can search for /\D+/ and replace that with nothing - this will remove everything except digits from your string. Careful, it would also remove a leading + which might be relevant (?).
If you can't do a "cleaning step" before the validation, you could try validating it like this:
/^\D*07(?:\d*\d){9}\D*$/
This will match any string that contains exactly 11 numbers (and arbitrarily many non-number characters), the first two of which need to be 07.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How can I make this regex shorter - regex

Related

Match sematic version in string by regex

Enforcing strong passwords in Kohana Auth

Regular Expression to catch email encoded attachments string

Pattern doesn't remove special characters which are by themselves on a website

symfony form validation clean with regex before validate with regex

Categories

Resources