disallow repetition of given set of characters - regex

I need to make a regex which will reject string with any given character in set next to each other
". / - ( )"
For example:
123()123 - false
123--123 - false
124((123 - false
123(123)123-12-12 - true
This is what i have done so far:
(?:([\/().-])(?!.*\1))

You can use :
(^(?:(?![.\/()-]{2}).)*$)
DEMO
Explanation :

^((?![\/().-]{2}).)*$
This simply negates the regex [\/().-]{2} which matches if two of your characters are next to each other.
See this answer for further explanation.
Live demo

Maybe it is easier to do it other way around, match strings you don't want to allow.
if match [.\/()-]{2}
not allowed
else
allowed
end

Related

Shorten Regular Expression (\n) [duplicate]

I'd like to match three-character sequences of letters (only letters 'a', 'b', 'c' are allowed) separated by comma (last group is not ended with comma).
Examples:
abc,bca,cbb
ccc,abc,aab,baa
bcb
I have written following regular expression:
re.match('([abc][abc][abc],)+', "abc,defx,df")
However it doesn't work correctly, because for above example:
>>> print bool(re.match('([abc][abc][abc],)+', "abc,defx,df")) # defx in second group
True
>>> print bool(re.match('([abc][abc][abc],)+', "axc,defx,df")) # 'x' in first group
False
It seems only to check first group of three letters but it ignores the rest. How to write this regular expression correctly?
Try following regex:
^[abc]{3}(,[abc]{3})*$
^...$ from the start till the end of the string
[...] one of the given character
...{3} three time of the phrase before
(...)* 0 till n times of the characters in the brackets
What you're asking it to find with your regex is "at least one triple of letters a, b, c" - that's what "+" gives you. Whatever follows after that doesn't really matter to the regex. You might want to include "$", which means "end of the line", to be sure that the line must all consist of allowed triples. However in the current form your regex would also demand that the last triple ends in a comma, so you should explicitly code that it's not so.
Try this:
re.match('([abc][abc][abc],)*([abc][abc][abc])$'
This finds any number of allowed triples followed by a comma (maybe zero), then a triple without a comma, then the end of the line.
Edit: including the "^" (start of string) symbol is not necessary, because the match method already checks for a match only at the beginning of the string.
The obligatory "you don't need a regex" solution:
all(letter in 'abc,' for letter in data) and all(len(item) == 3 for item in data.split(','))
You need to iterate over sequence of found values.
data_string = "abc,bca,df"
imatch = re.finditer(r'(?P<value>[abc]{3})(,|$)', data_string)
for match in imatch:
print match.group('value')
So the regex to check if the string matches pattern will be
data_string = "abc,bca,df"
match = re.match(r'^([abc]{3}(,|$))+', data_string)
if match:
print "data string is correct"
Your result is not surprising since the regular expression
([abc][abc][abc],)+
tries to match a string containing three characters of [abc] followed by a comma one ore more times anywhere in the string. So the most important part is to make sure that there is nothing more in the string - as scessor suggests with adding ^ (start of string) and $ (end of string) to the regular expression.
An alternative without using regex (albeit a brute force way):
>>> def matcher(x):
total = ["".join(p) for p in itertools.product(('a','b','c'),repeat=3)]
for i in x.split(','):
if i not in total:
return False
return True
>>> matcher("abc,bca,aaa")
True
>>> matcher("abc,bca,xyz")
False
>>> matcher("abc,aaa,bb")
False
If your aim is to validate a string as being composed of triplet of letters a,b,and c:
for ss in ("abc,bbc,abb,baa,bbb",
"acc",
"abc,bbc,abb,bXa,bbb",
"abc,bbc,ab,baa,bbb"):
print ss,' ',bool(re.match('([abc]{3},?)+\Z',ss))
result
abc,bbc,abb,baa,bbb True
acc True
abc,bbc,abb,bXa,bbb False
abc,bbc,ab,baa,bbb False
\Z means: the end of the string. Its presence obliges the match to be until the very end of the string
By the way, I like the form of Sonya too, in a way it is clearer:
bool(re.match('([abc]{3},)*[abc]{3}\Z',ss))
To just repeat a sequence of patterns, you need to use a non-capturing group, a (?:...) like contruct, and apply a quantifier right after the closing parenthesis. The question mark and the colon after the opening parenthesis are the syntax that creates a non-capturing group (SO post).
For example:
(?:abc)+ matches strings like abc, abcabc, abcabcabc, etc.
(?:\d+\.){3} matches strings like 1.12.2., 000.00000.0., etc.
Here, you can use
^[abc]{3}(?:,[abc]{3})*$
^^
Note that using a capturing group is fraught with unwelcome effects in a lot of Python regex methods. See a classical issue described at re.findall behaves weird post, for example, where re.findall and all other regex methods using this function behind the scenes only return captured substrings if there is a capturing group in the pattern.
In Pandas, it is also important to use non-capturing groups when you just need to group a pattern sequence: Series.str.contains will complain that this pattern has match groups. To actually get the groups, use str.extract. and
the Series.str.extract, Series.str.extractall and Series.str.findall will behave as re.findall.

Regex Find English char in text need more than 3

I want to validate a text that need have more than 3 [aA-zZ] chars, not need continous.
/^(?![_\-\s0-9])(?!.*?[_\-\s]$)(?=.*[aA-zZ]{3,})[_\-\sa-zA-Z0-9]+$/.test("aaa123") => return true;
/^(?![_\-\s0-9])(?!.*?[_\-\s]$)(?=.*[aA-zZ]{3,})[_\-\sa-zA-Z0-9]+$/.test("a1b2c3") => return false;
Can anybody help me?
How about replacing and counting?
var hasFourPlusChars = function(str) {
return str.replace(/[^a-zA-Z]+/g, '').length > 3;
};
console.log(hasFourPlusChars('testing1234'));
console.log(hasFourPlusChars('a1b2c3d4e5'));
You need to group .* and [a-zA-Z] in order to allow optional arbitrary characters between English letters:
^(?![_\-\s0-9])(?!.*?[_\-\s]$)(?=(?:.*[a-zA-Z]){3,})[_\-\sa-zA-Z0-9]+$
^^^ ^
Add this
Demo:
var re = /^(?![_\-\s0-9])(?!.*?[_\-\s]$)(?=(?:.*[aA-zZ]){3,})[_\-\sa-zA-Z0-9]+$/;
console.log(re.test("aaa123"));
console.log(re.test("a1b2c3"));
By the way, [aA-zZ] is not a correct range definition. Use [a-zA-Z] instead. See here for more details.
Correction of the regex
Your repeat condition should include the ".*". I did not check if your regex is correct for what you want to achieve, but this correction works for the following strings:
$testStrings=["aaa123","a1b2c3","a1b23d"];
foreach($testStrings as $s)
var_dump(preg_match('/^(?![_\-\s0-9])(?!.*?[_\-\s]$)(?=.*[a-zA-Z]){3,}[_\-\sa-zA-Z0-9]+$/', $s));
Other implementations
As the language seems to be JavaScript, here is an optimised implementation for what you want to achieve:
"a24be4Z".match(/[a-zA-Z]/g).length>=3
We get the list of all matches and check if there are at least 3.
That is not the "fastest" way as the result needs to be created.
)
/(?:.*?[a-zA-Z]){3}/.test("a24be4Z")
is faster. ".*?" avoids that the "test" method matches all characters up to the end of the string before testing other combinations.
As expected, the first suggestion (counting the number of matches) is the slowest.
Check https://jsperf.com/check-if-there-are-3-ascii-characters .

Regex match between two tags or else match everything

I have a list of email addresses which take various forms:
john#smith.com
Angie <angie#aol.com>
"Mark Jones" <mark#jones.com>
I'm trying to cut only the email portion from each. Ex: I only want the angie#aol.com from the second item in the list. In other words, I want to match everything between < and > or match everything if it doesn't exist.
I know this can be done in 2 steps:
Capture on (?<=\<)(.*)(?=\>).
If there is no match, use the entire text.
But now I'm wondering: Can both steps be reduced into one simple regular expression?
What about:
(?<=\<).*(?=\>)|^[^<]*$
^[^>]*$ will match the entire string, but only if it doesn't contain a <. And that's OR'ed (|) with what you had.
Explanation:
^ - start of string
[^<] - not-< character
[^<]* - zero or more not-< characters
$ - end of string
You're after an exclusive or operator. Have a look here.
(\<.+\#.+\..+\>) matches those email addresses in side <> only...
(\<.+\#.+\..+\>)|(.+) matches everything instead of matching the first condition in the OR then skipping the second.
Depending on what language you are using to implement this regex, you might be able to use an inbuilt exclusive or operator. Otherwise, you might need to put a bit of logic in there to use the string if no matches are found. E.g. (pseudo type code):
string = 'your data above';
if( regex_finds_match ( '(\<.+\#.+\..+\>)', string ) ) {
// found match, use the match
str_to_use = regex_match(es);
} else {
// didn't find a match:
str_to_use = string;
}
It is possible, but your current logic is probably simpler. Here is what I came up with, email address will always be in the first capturing group:
^(?:.*<|)(.*?)(?:>|$)
Example: http://rubular.com/r/8tKHaYYY4T

how to create regular expression for this sentence?

i have following statement {$("#aprilfoolc").val("HoliWed27"); $("#UgadHieXampp").val("ugadicome");}.and i want to get the string with combination.i have written following regex but it is not working.
please help!
(?=[\$("#]?)[\w]*(?<=[")]?)
Your lookaround assertions are using character classes by mistake, and you've confused lookbehind and lookahead. Try the following:
(?<=\$\(")\w*(?="\))
You could use this simpler one :
'{$("#aprilfoolc").val("HoliWed27");}'.match(/\$\(\"#(\w+)\"[^"]*"(\w+)"/)
This returns
["$("#aprilfoolc").val("HoliWed27"", "aprilfoolc", "HoliWed27"]
where the strings you want are at indexes 1 and 2.
This construction
(?=[\$*"#]?)
will match a lookahead, but only optional -- the character set is followed by a ?. This kind of defeats the next part,
[\w]
which matches word characters only. So the lookahead will never match. Similar, this part
(?<=[")])
will also never match, because logically there can never be one of the characters " or ) at the end of a string that matches \w only. Again, since this portion is optional (that ? at the end again) it will simply never match.
It's a bit unclear what you are after. Strings inside double quotes, yes, but in the first one you want to skip the hash -- why? Given your input and desired output, this ought to work:
\w+(?=")
Also possible:
/\("[#]?(.*?)"\)/
import re
s='{$("#aprilfoolc").val("HoliWed27");}'
f = re.findall(r'\("[#]?(.*?)"\)',s)
for m in f:
print m
I don't know why, but if you want capturing of two groups simultaneously, so:
/\("#(.*?)"\).*?\("(.*?)"\)/
import re
s='{$("#aprilfoolc").val("HoliWed27");}'
f = re.findall(r'\("#(.*?)"\).*?\("(.*?)"\)',s)
for m in f:
print m[0],m[1]
In JavaScript:
var s='{$("#aprilfoolc").val("HoliWed27")';
var re=/\("#(.*?)"\).*?\("(.*?)"\)/;
alert(s.match(re));

Regex match any character 5 or more times

I have this regex pattern:
^[.]{5,}$
Which I want to return true if the tested string has 5 or more characters.
I.E it'll only return false if the string contains 4 or less characters.
At the moment it seems to return true regardless of the number of characters and I can't see why.
You want
^.{5,}$
But really - just use the built-in string length function of the language of your choice
Try this regex:
.{5,}
more chars to make up the minimum post...
I think you dont need the ^ and $. Try just:
.{5,}