Validate field input using django serializer - django

I have a field code in my Django model.
I have to check if the input string has the following properties:
Length should be 5 characters
1st & 2nd characters should be alphabets
3rd & 4th characters should be numbers
5th character should be E or N
I have a serializer as shown below which satisfies the first condition.
class MyModelSerializer(serializers.ModelSerializer):
code = serializers.CharField(max_length=5, min_length=5)
class Meta:
model = MyModel
fields = '__all__'
How can I fulfil the remaining conditions?

The remaining conditions can be fulfilled by adding a custom validator and using regex.
import re
class MyModelSerializer(serializers.ModelSerializer):
code = serializers.CharField(max_length=5, min_length=5)
def validate_code(self, value):
# check if the code matches the required pattern
pattern = r'^[A-Za-z]{2}\d{2}[EN]$'
if not re.match(pattern, value):
raise serializers.ValidationError('Code does not match required format')
return value
The name of the function validate_code is not mandatory in the serializer. You can name it whatever you like, as long as it starts with the word "validate" and takes in a value argument. The value argument represents the value of the field being validated.
Here is a bit of in-depth info about the regex pattern used above.
1st & 2nd characters should be alphabets:
[A-Za-z]{2} matches any two alphabetical characters, ensuring that the first and second characters are alphabet
3rd & 4th characters should be numbers:
\d{2} matches any two numeric characters, ensuring that the third and fourth characters are numbers
5th character should be E or N:
[EN] matches either E or N, ensuring that the fifth character is either E or N

Related

RegEx: Any string must contain at least N chars from a specific list of chars

I'm very new to learning RegEx, and need a little help updating what I have. I will use it to evaluate student spreadsheet functions. I know it isn't perfect, but I'm trying to use this as a stepping stone to a better understanding of RegEx. I currently have [DE45\\+\\s]*$ but it does not validate for criteria #4 below. Any help is greatly appreciated.
I need to validate an input so that it matches these four criteria:
Letters D and E: (UPPERCASE, in any order, in any length string)
Numbers 4 and 5: (in any order, in any length string) Special
Characters: comma (,) and plus (+) (in any order, in any length string)
All six characters DE45+, must be present in the string at least once.
Results
pass: =if(D5>0,E4+D5,0)
pass: =if(D5>0,D5+E4,0)
fail: Dad Eats # 05:40
pass: Dad, Eats+Drinks # 05:40
fail: =if(E4+D5)
pass: DE45+,
The attempt you made -- with a character class -- will not work since [DE45] matches any single character in the class -- not all of them.
This type of problem is solved with a series of anchored lookaheads where all of these need to be true for a match at the anchor:
^(?=.*D)(?=.*E)(?=.*\+)(?=.*4)(?=.*5)(?=.*,)
Demo
Lookaround tutorial
Also, depending on the language, you can chain logic with regex matches. In Perl for example you would do:
/D/ && /E/ && /\+/ && /4/ && /5/ && /,/
In Python:
all(re.search(p, a_str) for p in [re.escape(c) for c in 'DE45+,'])
Of course easier still is to use a language's set functions to test that all required characters are present.
Here in Python:
set(a_str) >= set('DE45+,')
This returns True only if all the characters in 'DE45+,' are in a_str.
A Regular Expression character class (in the square brackets) is an OR search. It will match if any one of the characters in it is present, which does not allow you to verify #4.
For that you could build on top of a regex, as follows:
Find all instances of any of the characters you're looking for individually with a simple character class search. (findall using [DE45+,]+)
Merge all the found characters into one string (join)
Do a set comparison with {DE45+,}. This will only be True if all the characters are present, in any amount and in any order (set)
set(''.join(re.findall(r'[DE45+,]+','if(D5>0,4+D5,0)E'))) == set('DE45+,')
You can generalize this for any set of characters:
import re
lookfor = 'DE45+,'
lookfor_re = re.compile(f'[{re.escape(lookfor)}]+')
strings = ['=if(D5>0,E4+D5,0)', '=if(D5>0,D5+E4,0)', 'Dad Eats # 05:40', 'Dad, Eats+Drinks # 05:40', '=if(E4+D5)', 'DE45+,']
for s in strings:
found = set(''.join(lookfor_re.findall(s))) == set(lookfor)
print(f'{s} : {found}')
Just set lookfor as a string containing each of the characters you're looking for and strings as a list of the strings to search for. You don't need to worry about escaping any special characters with \. re.escape does this for you here.
=if(D5>0,E4+D5,0) : True
=if(D5>0,D5+E4,0) : True
Dad Eats # 05:40 : False
Dad, Eats+Drinks # 05:40 : True
=if(E4+D5) : False
DE45+, : True

convert string to regex pattern

I want to find the pattern of a regular expression from a character string. My goal is to be able to reuse this pattern to find a string in another context but checking the pattern.
from sting "1example4whatitry2do",
I want to find pattern like: [0-9]{1}[a-z]{7}[0-9]{1}[a-z]{8}[0-9]{1}[a-z]{2}
So I can reuse this pattern to find this other example of sting 2eytmpxe8wsdtmdry1uo
I can do a loop on each caracter, but I hope there is a fast way
Thanks for your help !
You can puzzle this out:
go over your strings characterwise
if the character is a text character add a 't' to a list
if the character is a number add a 'd' to a list
if the character is something else, add itself to the list
Use itertools.groupby to group consecutive identical letters into groups.
Create a pattern from the group-key and the length of the group using some string literal formatting.
Code:
from itertools import groupby
from string import ascii_lowercase
lower_case = set(ascii_lowercase) # set for faster lookup
def find_regex(p):
cum = []
for c in p:
if c.isdigit():
cum.append("d")
elif c in lower_case:
cum.append("t")
else:
cum.append(c)
grp = groupby(cum)
return ''.join(f'\\{what}{{{how_many}}}'
if how_many>1 else f'\\{what}'
for what,how_many in ( (g[0],len(list(g[1]))) for g in grp))
pattern = "1example4...whatit.ry2do"
print(find_regex(pattern))
Output:
\d\t{7}\d\.{3}\t{6}\.\t{2}\d\t{2}
The ternary in the formatting removes not needed {1} from the pattern.
See:
str.isdigit()
If you now replace '\t'with '[a-z]' your regex should fit. You could also replace isdigit check using a regex r'\d' or a in set(string.digits) instead.
pattern = "1example4...whatit.ry2do"
pat = find_regex(pattern).replace(r"\t","[a-z]")
print(pat) # \d[a-z]{7}\d\.{3}[a-z]{6}\.[a-z]{2}\d[a-z]{2}
See
string module for ascii_lowercase and digits

Regular Expression: is it possible to get numbers in optional parts by one regex

I have one string, it will be like: 1A2B3C or 2B3C or 1A2B or 1A3C.
The string is comprised by serval optional parts of number + [A|B|C].
It is possible to get the numbers before every character with one regex?
For example:
1A2B3C => (1, 2, 3)
1A3C => (1, 0, 3) There is no 'B', so gives 0 instead.
=> Or just (1, 3) but should show that the 3 is in front of 'C'.
Assuming Python because of your tuple notation, and because that's what I feel like using.
If the only allowed letters are A, B and C, you can do it with an extra processing step:
pattern = re.compile(r'(?:(\d+)A)(?:(\d+)B)?(?:(\d+)C)?')
match = pattern.fullmatch(some_string)
if match:
result = tuple(int(g) for g in match.groups('0'))
else:
raise ValueError('Bad input string')
Each option is surrounded by a non-capturing group (?:...) so the whole thing gets treated as a unit. Inside the unit, there is a capturing group (\d+) to capture the number, and an uncaptured character.
The method Matcher.groups returns a tuple of all the groups in the regex, with unmatched ones set to '0'. The generator then converts to an int for you. You could use tuple(map(int, match.groups('0'))).
You can also use a dictionary to hold the numbers, keyed by character:
pattern = re.compile(r'(?:(?P<A>\d+)A)(?:(?P<B>\d+)B)?(?:(?P<C>\d+)C)?')
match = pattern.fullmatch(some_string)
if match:
result = {k: int(v) for k, v in match.groupdict('0').items()}
else:
raise ValueError('Bad input string')
Matcher.groupdict is just like groups except that it returns a dictionary of the named groups: capture groups marked (?P<NAME>...).
Finally, if you don't mind having the dictionary, you can adapt this approach to parse any number of groups with arbitrary characters:
pattern = re.compile(r'(\d+)([A-Z])')
result = {}
while some_string:
match = pattern.match(some_string)
if not match:
raise ValueError('Bad input string')
result[match.group(2)] = int(match.group(1))
some_string = some_string[match.end():]

Python regex matching enumerated lists

I have a python string of the following format
string = 'Some text.\n1. first item\n2. second item\n3. third item\nSome more text.'
What I want to match is the substring \n1. first item\n2. second item\n3. third item, effectively, the enumerated list within the string. For my purposes, I do not necessarily need to match the first \n.
What I've tried so far:
re.findall('\n.*\d\..*', req, re.DOTALL)
re.findall('\n.*\d\..*?', req, re.DOTALL)
The first case finds the last line of the text which I don't want, and the second case doesn't find the rest of line 3. The key difficulty I'm facing is that I don't know how to make the first .* greedy (and match over newlines) but make the second .* simply match up to a newline.
Note: The number of items in the enumerated string is unknown so I can't just match three numbered lines. It could be any number of lines. The string provided is simply an example which happens to have three enumerated items.
How about using line-wise matching and a filter?
string = 'Some text.\n1. first item\n2. second item\n3. third item\nSome more text.'
is_enumerated = re.compile(r"^\d+\.\s")
matches = list(filter(lambda line: is_enumerated.match(line), string.splitlines()))
# ['1. first item', '2. second item', '3. third item']
You can join the matches with \n, if you want.

Input validation with Python 3.4: can only contain

I would like to allow these characters [a-z]+\.+[0-9]*\_* (Must contain one or more lowercase alphabetical characters(a-z) and Must contain one or more periods(.) also can contain zero or more digits(0-9), zero or more underscores(_)) , but no others.
I have tried multiple ways without success:
import re
iStrings = str(input('Enter string? '))
iMatch = re.findall(r'[a-z]+\.+[0-9]*\_*', iStrings)
iiMatch = re.findall(r'[~`!#$%^&*()-+={}\[]|\;:\'"<,>.?/]', iStrings)
iiiMatch = iMatch != iiMatch
if iiiMatch:
print(':)')
else:
print(':(')
Another example:
import re
iStrings = str(input('Enter string? '))
iMatch = re.findall(r'[a-z]+\.+[0-9]*\_*', iStrings) not "[~`!#$%^&*()-+={}\[]|\;:\'"<,>.?/]" in iStrings
if iMatch:
print(':)')
else:
print(':(')
Any help would be much appreciated.
Edit: added clarification.
Edit: For additional information: https://forums.uberent.com/threads/beta-mod-changes.51520/page-8#post-939265
allow these characters [a-z]+\.+[0-9]*\_*
First off, [a-z]+ is not "a" character. Neither is [0-9]* nor \_*
I am assuming that you mean you want to allow letters, digits, underscores, dots, plusses and asterisks.
Try this:
^[\w*.+]+$
The \w already matches [a-z], [0-9] and _
The anchors ^ and $ ensure that nothing else is allowed.
From your question I wasn't clear if you wanted to allow a + character to match. If not, remove it from the character class: ^[\w*.]+$. Likewise, remove the * if it isn't needed.
In code:
if re.search(r"^[\w*.+]+$", subject):
# Successful match
else:
# Match attempt failed
EDIT following your comment:
For a string that must contain one or more letter, AND one or more dot, AND zero or more _, AND zero or more digits, we need lookaheads to enforce the one or more conditions. You can use this:
^(?=.*[a-z])(?=.*\.)[\w_.]+$