regular expression which should allow limited special characters - regex

Can any one tell me the regular expression for textfield which should not allow following characters and can accept other special characters,alphabets,numbers and so on :
+ - && || ! ( ) { } [ ] ^ " ~ * ? : \ # &

this will not allow string that contains any of the characters in any part of the string mentioned above.
^(?!.*[+\-&|!(){}[\]^"~*?:#&]+).*$
See Here
Brief Explanation
Assert position at the beginning of a line (at beginning of the string or after a line break character) ^
Assert that it is impossible to match the regex below starting at this position (negative lookahead) (?!.*[+\-&|!(){}[\]^"~*?:#&]+)
Match any single character that is not a line break character .*
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
Match a single character present in the list below [+\-&|!(){}[\]^"~*?:#&]+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
The character "+" +
A "-" character \-
One of the characters &|!(){}[” «&|!(){}[
A "]" character \]
One of the characters ^"~*?:#&” «^"~*?:#&
Match any single character that is not a line break character .*
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
Assert position at the end of a line (at the end of the string or before a line break character) $

Its usually better to whitelist characters you allow, rather than to blacklist characters you don't allow. both from a security standpoint, and from an ease of implementation standpoint.
If you do go down the blacklist route, here is an example, but be warned, the syntax is not simple.
http://groups.google.com/group/regex/browse_thread/thread/0795c1b958561a07
If you want to whitelist all the accent characters, perhaps using unicode ranges would help? Check out this link.
http://www.regular-expressions.info/unicode.html

I recognize those as the characters which need to be escaped for Solr. If this is the case, and if you are coding in PHP, then you should use my PHP utility functions from Github. Here is one of the Solr functions from there:
/**
* Escape values destined for Solr
*
* #author Dotan Cohen
* #version 2013-05-30
*
* #param value to be escaped. Valid data types: string, array, int, float, bool
* #return Escaped string, NULL on invalid input
*/
function solr_escape($str)
{
if ( is_array($str) ) {
foreach ( $str as &$s ) {
$s = solr_escape($s);
}
return $str;
}
if ( is_int($str) || is_float($str) || is_bool($str) ) {
return $str;
}
if ( !is_string($str) ) {
return NULL;
}
$str = addcslashes($str, "+-!(){}[]^\"~*?:\\");
$str = str_replace("&&", "\\&&", $str);
$str = str_replace("||", "\\||", $str);
return $str;
}

Related

Regex Express Return All Chars before a '/' but if there are 2 '/' Return all before that

I have been trying to get a regex expression to return me the following in the following situations.
XX -> XX
XXX -> XXX
XX/XX -> XX
XX/XX/XX -> XX/XX
XXX/XXX/XX -> XXX/XXX
I had the following Regex, however they do no work.
^[^/]+ => https://regex101.com/r/xvCbNB/1
=========
([A-Z])\w+ => https://regex101.com/r/xvCbNB/2
They are close but are not there.
Any Help would be appreciated.
You want to get all text from the start till the last occurrence of a specific character or till the end of string if the character is missing.
Use
^(?:.*(?=\/)|.+)
See the regex demo and the regex graph:
Details
^ - start of string
(?:.*(?=\/)|.+) - a non-capturing group that matches either of the two alternatives, and if the first one matches first the second won't be tried:
.*(?=\/) - any 0+ chars other than line break chars, as many as possible upt to but excluding /
| - or
.+ - any 1+ chars other than line break chars, as many as possible.
It will be easier to use a replace here to match / followed by non-slash characters before end of line:
Search regex:
/[^/]*$
Replacement String:
""
Updated RegEx Demo 1
If you're looking for a regex match then use this regex:
^(.*?)(?:/[^/]*)?$
Updated RegEx Demo 2
Any special reason it has to be a regular expression? How about just splitting the string at the slashes, remove the last item and rejoin:
function removeItemAfterLastSlash(string) {
const list = string.split(/\//);
if (list.length == 1) [
return string;
}
list.pop();
return list.join("/");
}
Or look for the last slash an remove it:
function removeItemAfterLastSlash(string) {
const index = string.lastIndexOf("/");
if (index === -1) {
return string;
}
return string.splice(0, index);
}

Search for substring and store another part of the string as variable in perl

I am revamping an old mail tool and adding MIME support. I have a lot of it working but I'm a perl dummy and the regex stuff is losing me.
I had:
foreach ( #{$body} ) {
next if /^$/;
if ( /NEMS/i ) {
/.*?(\d{5,7}).*/;
$nems = $1;
next;
}
if ( $delimit ) {
next if (/$delimit/ && ! $tp);
last if (/$delimit/ && $tp);
$tp = 1, next if /text.plain/;
$tp = 0, next if /text.html/;
s/<[^>]*>//g;
$newbody .= $_ if $tp;
} else {
s/<[^>]*>//g;
$newbody .= $_ ;
}
} # End Foreach
Now I have $body_text as the plain text mail body thanks to MIME::Parser. So now I just need this part to work:
foreach ( #{$body_text} ) {
next if /^$/;
if ( /NEMS/i ) {
/.*?(\d{5,7}).*/;
$nems = $1;
next;
}
} # End Foreach
The actual challenge is to find NEMS=12345 or NEMS=1234567 and set $nems=12345 if found. I think I have a very basic syntax problem with the test because I'm not exposed to perl very often.
A coworker suggested:
foreach (split(/\n/,$body_text)){
next if /^$/;
if ( /NEMS/i ) {
/.*?(\d{5,7}).*/;
$nems = $1;
next;
}
}
Which seems to be working, but it may not be the preferred way?
edit:
So this is the most current version based on tips here and testing:
foreach (split(/\n/,$body_text)){
next if /^$/;
if ( /NEMS/i ) {
/^\s*NEMS\s*=\s*(\d+)/i;
$nems = $1;
next;
}
}
Match the last two digits as optional and capture the first five, and assign the capture directly
($nems) = /(\d{5}) (?: \d{2} )?/x; # /x allows spaces inside
The construct (?: ) only groups what's inside, without capture. The ? after it means to match that zero or one time. We need parens so that it applies to that subpattern only. So the last two digits are optional -- five digits or seven digits match. I removed the unneeded .*? and .*
However, by what you say it appears that the whole thing can be simplified
if ( ($nems) = /^\s*NEMS \s* = \s* (\d{5}) (?:\d{2})?/ix ) { next }
where there is now no need for if (/NEMS/) and I've adjusted to the clarification that NEMS is at the beginning and that there may be spaces around =. Then you can also say
my $nems;
foreach ( split /\n/, $body_text ) {
# ...
next if ($nems) = /^\s*NEMS\s*=\s*(\d{5})(?:\d{2})?/i;
# ...
}
what includes the clarification that the new $body_text is a multiline string.
It is clear that $nems is declared (needed) outside of the loop and I indicate that.
This allows yet more digits to follow; it will match on 8 digits as well (but capture only the first five). This is what your trailing .* in the regex implies.
Edit It's been clarified that there can only be 5 or 7 digits. Then the regex can be tightened, to check whether input is as expected, but it should work as it stands, too.
A few notes, let me know if more would be helpful
The match operator returns a list so we need the parens in ($nems) = /.../;
The ($nems) = /.../ syntax is a nice shortcut, for ($nems) = $_ =~ /.../;.
If you are matching on a variable other than $_ then you need the whole thing.
You always want to start Perl programs with
use warnings 'all';
use strict;
This directly helps and generally results in better code.
The clarification of the evolved problem understanding states that all digits following = need be captured into $nems (and there may be 5,(not 6),7,8,9,10 digits). Then the regex is simply
($nems) = /^\s*NEMS\s*=\s*(\d+)/i;
where \d+ means a digit, one or more times. So a string of digits (match fails if there are none).

Reg expression validate / \ # & characters

I've been learning how Regular expressions work, which is very tricky for me. I would like to validate this chars below from input field. Basically if string contains any of these characters, alert('bad chars')
/
\
#
&
I found this code, but when I change it around doesn't seem to work. How can I alter this code to meet my needs?
var str = $(this).val();
if(/^[a-zA-Z0-9- ]*$/.test(str) == false) {
alert('bad');
return false;
} else {
alert('good');
}
/^[a-zA-Z0-9- ]*$/ means the following:
^ the string MUST start here
[a-zA-Z0-9- ] a letter between a and z upper or lower case, a number between 0 and 9, dashes (-) and spaces.
* repeated 0 or more times
$ the string must end here.
In the case of "any character but" you can use ^ like so: /^[^\/\\#&]*$/. If this matches true, then it doesn't have any of those characters. ^ right after a [ means match anything that isn't the following.
.
You could just try the following:
if("/[\\/#&]/".test(str) == true) {
alert('bad');
return false;
} else {
alert('good');
}
NOTE: I'm not 100% on what characters need to be escaped in JavaScript vs. .NET regular expressions, but basically, I'm saying if your string contains any of the characters \, /, # or &, then alert 'bad'.

How to validate a string to have only certain letters by perl and regex

I am looking for a perl regex which will validate a string containing only the letters ACGT. For example "AACGGGTTA" should be valid while "AAYYGGTTA" should be invalid, since the second string has "YY" which is not one of A,C,G,T letters. I have the following code, but it validates both the above strings
if($userinput =~/[A|C|G|T]/i)
{
$validEntry = 1;
print "Valid\n";
}
Thanks
Use a character class, and make sure you check the whole string by using the start of string token, \A, and end of string token, \z.
You should also use * or + to indicate how many characters you want to match -- * means "zero or more" and + means "one or more."
Thus, the regex below is saying "between the start and the end of the (case insensitive) string, there should be one or more of the following characters only: a, c, g, t"
if($userinput =~ /\A[acgt]+\z/i)
{
$validEntry = 1;
print "Valid\n";
}
Using the character-counting tr operator:
if( $userinput !~ tr/ACGT//c )
{
$validEntry = 1;
print "Valid\n";
}
tr/characterset// counts how many characters in the string are in characterset; with the /c flag, it counts how many are not in the characterset. Using !~ instead of =~ negates the result, so it will be true if there are no characters not in characterset or false if there are characters not in characterset.
Your character class [A|C|G|T] contains |. | does not stand for alternation in a character class, it only stands for itself. Therefore, the character class would include the | character, which is not what you want.
Your pattern is not anchored. The pattern /[ACGT]+/ would match any string that contains one or more of any of those characters. Instead, you need to anchor your pattern, so that only strings that contain just those characters from beginning to end are matched.
$ can match a newline. To avoid that, use \z to anchor at the end. \A anchors at the beginning (although it doesn't make a difference whether you use that or ^ in this case, using \A provides a nice symmetry.
So, you check should be written:
if ($userinput =~ /\A [ACGT]+ \z/ix)
{
$validEntry = 1;
print "Valid\n";
}

Need help converting "sassy" to "$a55y" using a regular expression?

Any s at the beginning of the word should be converted to a $.
Any s inside the word should be converted to a 5.
To match an s at the start of the word, use \b to match word boundaries and \w to match alphanumerics:
/\bs\w/
(as #Matthew points out, the \w is really superfluous:)
/\bs/
Once you've replaced all s at the start of a word, then the only remaining ones are inside the word (I'm assuming that you also want to replace s at the end of a word with 5) so you can simply use
/s/
For completeness, here's how to put it all together (I'm going to assume JavaScript):
function pimpMyEsses(str)
{
return str.replace(/\bs/gi, '$').replace(/s/gi, '5');
}
console.log(pimpMyEsses('slither quantum Sassy. arcades'));
// > "$lither quantum $a55y. arcade5"
Depending on the language it may be possible to capture the substitutions with a single regular expression and replace them procedurally. Here's a PHP example:
<?php
$word = 'sassy';
preg_match_all('/\b(s)|([^s]+)|(s)/', $word, $matches, PREG_SET_ORDER);
/* captures:
* $matches = array(
* array('s','s'),
* array('a','','a'),
* array('s','','','s'),
* array('s','','','s'),
* array('y','','y')
* )
*/
$newword = '';
foreach ($matches as $m){
if ($m[1]) $newword .= '$'; # leading s --> $
elseif ($m[2]) $newword .= $m[2]; # not an s --> as-is
else $newword .= '5'; # any other s --> 5
}
echo $newword;
Because I've used \b to match a word-boundary before the "leading s", the string 'sassy socks' becomes '$a55y $ock5'
If you want only the s at the start of "sassy" to become a $, change the regular expression to:
'/^(s)|([^s]+)|(s)/'
You can do:
/^(s)/ to select only the first "s";
/(?:[^s])(?:(s)[^s]*)+ to select all other "s". Note that the first character will be skipped (which is independent of);
Explain:ignore first character;Repeat one or more: get a "s" and ignore others character that not "s";
Next step: you need to determinate what language you will use.