GSTIN Validation through condition not working in Perl - regex

I have written Perl code for validating GSTIN Number which is related to India’s tax according to the following rules:
The first two digits represent the state code as per Indian Census 2011. Every state has a unique code.
The next ten digits will be the PAN number of the taxpayer
The thirteenth digit will be assigned based on the number of registration within a state
The fourteenth digit will be Z by default
The last digit will be for check code. It may be an alphabet or a number.
Following is the code:
my $gst_number_input = '35AABCS1429B1AX';
my $gst_number_character_count = length($gst_number_input);
my $gst_validation =~ /\d{2}[A-Z]{5}\d{4}[A-Z]{1}[A-Z\d]{1}[Z]{1}[A-Z\d]{1}/;
if ($gst_number_character_count == 15 && $gst_number_input =~ $gst_validation) {
print "GST Number is valid";
} else {
print "Invalid GST Number";
}
I have an invalid GSTIN input entered in the code. So when I run the script, I get:
GST Number is valid
Instead I should get the error because the GSTIN input is invalid:
Invalid GST Number
Can anyone please help ?
Thanks in advance

In this part you are using =~ where is should be an equals sign =
my $gst_validation =~ /\d{2}[A-Z]{5}\d{4}[A-Z]{1}[A-Z\d]{1}[Z]{1}[A-Z\d]{1}/;
If you want to use is as a variable, you could use qr
Note that you can omit {1} from the pattern and you don't have to use the square brackets around [Z]
You code might look like
my $gst_number_input = '35AABCS1429B1AX';
my $gst_number_character_count = length($gst_number_input);
my $gst_validation = qr/\d{2}[A-Z]{5}\d{4}[A-Z][A-Z\d]Z[A-Z\d]/;
if ($gst_number_character_count == 15 && $gst_number_input =~ $gst_validation) {
print "GST Number is valid";
} else {
print "Invalid GST Number";
}

Related

why does the while loop not work?

I'm creating a program that takes in the users grade(A-F), health(0-100) and economic output(0-100). I need a while loop for when the user inputs a value wrong e.g A for health. why does the loop keep repeating? how do I do this for the grade as well?
name = raw_input(' Enter your name: ')
grade = raw_input(' Enter your grade: ')
string_two = raw_input(' Enter your economic out put: ')
while string_two not in range (0,100):
print 'please enter a value between 0 and 100.'
string_two = raw_input(' Enter your economic out put: ')
string_one = raw_input(' Enter your health: ')
while string_one not in range (0,100):
print 'please enter a value between 0 and 100.'
string_one = raw_input(' Enter your health: ')
health == int(string_one)
economic_output == int(string_two)
if economic_output > 85:
print name + ' you are exceptional! Welcome aboard!'
elif grade == 'A':
print name + ' you are exceptional! Welcome aboard!'
elif economic_output > 60:
if health > 60:
if grade == 'B' or 'C':
print 'Congatulations ' + name +'!' + ' Come aboard!'
else:
print 'Sorry, ' + name + ' but you dont meet the criteria and cammot be permitted to board. '
The while loop keeps repeating because raw_input always returns a string object, regardless if you typed digits only or something else. string_two not in range (0,100) is always true because a string cannot be in any range of ints.
So what you can do. If I were you, I would define a function which would ask the user to enter something in a while loop until his input satisfies some pattern (just as you are trying to do in your code) and convert it to a specific type if necessary. There is a very useful tool to check a string for matching some pattern, it is called regular expressions (RE).
import re
def inputAndValidate(prompt, pattern, convertFunc=str):
result = raw_input(prompt)
while re.match(pattern, result) is None:
print 'Input pattern is "' + pattern + '". Please enter a string matching to it.'
result = raw_input(prompt)
return convertFunc(result)
Putting a piece of code being used several times to a function is a good practice. So here I defined a function called inputAndValidate with 2 mandatory parameters - a prompt to be displayed for user to let him understand what do you want from him, and the pattern to check his input. The third parameter is optional - it's a function which will be applied to the user input. By default, it's str, a function converting something to a string. User input is already a string, so this function will do nothing, but if we need an int, we'll pass int as the third parameter and get an int from the inputAndValidate function, and so on.
re.match(pattern, string) returns a match object if the string did match the pattern, and None if it didn't. See official re documentation for more information.
Usage of the function
name = inputAndValidate('Enter your name: ', r'.+')
# This pattern means any string except empty one. Thus we ensure user will not enter an empty name.
grade = inputAndValidate('Enter your grade (A-F): ', r'[A-F]')
# This pattern means a string containing only one symbol in the A-F range (in the ASCII table).
economic_output = inputAndValidate('Enter your economic output (0-100): ', r'\d{1,3}', int)
# This pattern means a string containing from one to three digits. `int` passed to the third parameter means the result will be an int.
health = inputAndValidate('Enter your health (0-100): ', r'\d{1,3}', int)
# Same as the previous one.
re module documentation also contains information about the regular expressions language, it can help you understand the patterns I used.

Regular expression for comma seperated name search with wild card

Right now I am using multiple if conditions to valid the input for search by name with wildcard(*). Since I have multiple 'if' with inner 'if' statements I am trying to use regular expression to validate my input. I want to use this expression in both front end and back end.
Appreciate if anyone can help.
Validating rules are follow
Input is last name, first name i.e. separated by comma.
Must have at least two characters while using wild card search.
Valid wildcard character is '*' only.
At most two wildcard characters can be used.
No consecutive wild cards.
If no wild card used no constraint on length of characters in both last and first name.
Some of the valid inputs are:
- hopkins, johns
- h, j
- ho*, jp*
- *ins, johns
- *op*, john*
Some of the invalid inputs are:
- hopkins johns
- h*, johns
- hop**, joh*
- h*pk*n*
If regular expression not going to be complex we can consider this as valid otherwise it OK to consider this as invalid
- ho*in*, jo*
In short general name format is
[*]XX[*], [*]XX[*]
where [] ==> Optional
X ==> A-Z, a-z
XX ==> length 2 or more if wild card used
You can use this regex
\*?[a-zA-Z]{2,}\*?, \*?[a-zA-Z]{2,}\*?
The before doing validation with the above regex, just do something like match the number of * with the regex /\*/g and make sure it's length is between 0 to 2.
With the help of #Amit_Joki answer I wrote the following code and its working fine.
var nameArray = [...];
var re = /\*?[a-zA-Z]{2,}\*?, \*?[a-zA-Z]{2,}\*?/;
for (var i = 0; i < nameArray.length; i++) {
if(nameArray[i].indexOf(',') < 0 ||
(nameArray[i].indexOf('*') >= 0 && !re.test(nameArray[i]))) {
console.log(nameArray[i] + ": Invalid");
} else {
console.log(nameArray[i] + ": Valid");
}
}

Regular expression needed for specific string in PHP

I need a regular expression to validate a string with the following conditions
String might contain any of digits space + - () / .
If string contain anything else then it should be invalid
If there is any + in the string then it should be at the beginning and there should at most one + , otherwise it would be invalid, if there are more than one + then it is invalid
String should be 7 to 20 character long
It is not compulsory to have all these digits space + - () / .
But it is compulsory to contain at least 7 digit
I think you are validating phone numbers with E.164 format. Phone number can contain many other format. It can contain . too. Multiple spaces in a number is not uncommon. So its better to format all the numbers to a common format and store that format in db. If that common format is wrong you can throw error.
I validate those phone numbers like this.
function validate_phone($phone){
// replace anything non-digit and add + at beginning
$e164 = "+". preg_replace('/\D+/', '', $phone);
// check validity by length;
return (strlen($e164)>6 && strlen($e164)<21);
}
Here I store $e164 in Db if its valid.
Even after that you can not validate a phone number. A valid phone number format does not mean its a valid number. For this an sms or call is generated against the number and activation code is sent. Once the user inputs the code phone number is fully validated.
You can do this in one regex:
/^(?=(?:.*\d){7})[0-9 ()\/+-][0-9 ()\/-]{6,19}$/
However I would personally do something like:
/^[0-9 ()\/+-][0-9 ()\/-]{6,19}$/
And then strip any non-digit and see if the remaining string is 7 or longer.
Let's try ...
preg_match('/^(?=(?:.*\d){7})[+\d\s()\/\-\.][\d\s()\/\-\.]{6,19}$/', $text);
Breaking this down:
We start with a positive look-ahead that requires a digit at least 7 times.
Then we match all the valid characters, including the plus.
Followed by matching all the valid characters without plus between 6 and 20 times.
A little more concise:
^\+?(?=(.*\d){7})[()/\d-]{7,19}$
'Course, why would you even use regular expressions?
function is_valid($string) {
$digits = 0;
$length = strlen($string);
if($length < 7 || $length > 20) {
return false;
}
for($i = 0; $i < $length; $i++) {
if(ctype_digit($string[$i])) {
$digits++;
} elseif(strpos('+-() ', $string[$i]) === false && ($string[$i] !== '+' || $i !== 0)) {
return false;
}
}
return $digits >= 7;
}

Regex performance: validating alphanumeric characters

When trying to validate that a string is made up of alphabetic characters only, two possible regex solutions come to my mind.
The first one checks that every character in the string is alphanumeric:
/^[a-z]+$/
The second one tries to find a character somewhere in the string that is not alphanumeric:
/[^a-z]/
(Yes, I could use character classes here.)
Is there any significant performance difference for long strings?
(If anything, I'd guess the second variant is faster.)
Just by looking at it, I'd say the second method is faster.
However, I made a quick non-scientific test, and the results seem to be inconclusive:
Regex Match vs. Negation.
P.S. I removed the group capture from the first method. It's superfluous, and would only slow it down.
Wrote this quick Perl code:
#testStrings = qw(asdfasdf asdf as aa asdf as8up98;n;kjh8y puh89uasdf ;lkjoij44lj 'aks;nasf na ;aoij08u4 43[40tj340ij3 ;salkjaf; a;lkjaf0d8fua ;alsf;alkj
a a;lkf;alkfa as;ldnfa;ofn08h[ijo ok;ln n ;lasdfa9j34otj3;oijt 04j3ojr3;o4j ;oijr;o3n4f;o23n a;jfo;ie;o ;oaijfoia ;aosijf;oaij ;oijf;oiwj;
qoeij;qwj;ofqjf08jf0 ;jfqo;j;3oj4;oijt3ojtq;o4ijq;onnq;ou4f ;ojfoqn;aonfaoneo ;oef;oiaj;j a;oefij iiiii iiiiiiiii iiiiiiiiiii);
print "test 1: \n";
foreach my $i (1..1000000) {
foreach (#testStrings) {
if ($_ =~ /^([a-z])+$/) {
#print "match"
} else {
#print "not"
}
}
}
print `date` . "\n";
print "test 2: \n";
foreach my $j (1..1000000) {
foreach (#testStrings) {
if ($_ =~ /[^a-z]/) {
#print "match"
} else {
#print "not"
}
}
}
then ran it with:
date; <perl_file>; date
it isn't 100% scientific, but it gives us a good idea. The first Regex took 10 or 11 seconds to execute, the second Regex took 8 seconds.

Regular Expression to find numbers with same digits in different order

I have been looking for a regular expression with Google for an hour or so now and can't seem to work this one out :(
If I have a number, say:
2345
and I want to find any other number with the same digits but in a different order, like this:
2345
For example, I match
3245 or 5432 (same digits but different order)
How would I write a regular expression for this?
There is an "elegant" way to do it with a single regex:
^(?:2()|3()|4()|5()){4}\1\2\3\4$
will match the digits 2, 3, 4 and 5 in any order. All four are required.
Explanation:
(?:2()|3()|4()|5()) matches one of the numbers 2, 3, 4, or 5. The trick is now that the capturing parentheses match an empty string after matching a number (which always succeeds).
{4} requires that this happens four times.
\1\2\3\4 then requires that all four backreferences have participated in the match - which they do if and only if each number has occurred once. Since \1\2\3\4 matches an empty string, it will always match as long as the previous condition is true.
For five digits, you'd need
^(?:2()|3()|4()|5()|6()){5}\1\2\3\4\5$
etc...
This will work in nearly any regex flavor except JavaScript.
I don't think a regex is appropriate. So here is an idea that is faster than a regex for this situation:
check string lengths, if they are different, return false
make a hash from the character (digits in your case) to integers for counting
loop through the characters of your first string:
increment the counter for that character: hash[character]++
loop through the characters of the second string:
decrement the counter for that character: hash[character]--
break if any count is negative (or nonexistent)
loop through the entries, making sure each is 0:
if all are 0, return true
else return false
EDIT: Java Code (I'm using Character for this example, not exactly Unicode friendly, but it's the idea that matters now):
import java.util.*;
public class Test
{
public boolean isSimilar(String first, String second)
{
if(first.length() != second.length())
return false;
HashMap<Character, Integer> hash = new HashMap<Character, Integer>();
for(char c : first.toCharArray())
{
if(hash.get(c) != null)
{
int count = hash.get(c);
count++;
hash.put(c, count);
}
else
{
hash.put(c, 1);
}
}
for(char c : second.toCharArray())
{
if(hash.get(c) != null)
{
int count = hash.get(c);
count--;
if(count < 0)
return false;
hash.put(c, count);
}
else
{
return false;
}
}
for(Integer i : hash.values())
{
if(i.intValue()!=0)
return false;
}
return true;
}
public static void main(String ... args)
{
//tested to print false
System.out.println(new Test().isSimilar("23445", "5432"));
//tested to print true
System.out.println(new Test().isSimilar("2345", "5432"));
}
}
This will also work for comparing letters or other character sequences, like "god" and "dog".
Put the digits of each number in two arrays, sort the arrays, find out if they hold the same digits at the same indices.
RegExes are not the right tool for this task.
You could do something like this to ensure the right characters and length
[2345]{4}
Ensuring they only exist once is trickier and why this is not suited to regexes
(?=.*2.*)(?=.*3.*)(?=.*4.*)(?=.*5.*)[2345]{4}
The simplest regular expression is just all 24 permutations added up via the or operator:
/2345|3245|5432|.../;
That said, you don't want to solve this with a regex if you can get away with it. A single pass through the two numbers as strings is probably better:
1. Check the string length of both strings - if they're different you're done.
2. Build a hash of all the digits from the number you're matching against.
3. Run through the digits in the number you're checking. If you hit a match in the hash, mark it as used. Keep going until you don't get an unused match in the hash or run out of items.
I think it's very simple to achieve if you're OK with matching a number that doesn't use all of the digits. E.g. if you have a number 1234 and you accept a match with the number of 1111 to return TRUE;
Let me use PHP for an example as you haven't specified what language you use.
$my_num = 1245;
$my_pattern = '/[' . $my_num . ']{4}/'; // this resolves to pattern: /[1245]{4}/
$my_pattern2 = '/[' . $my_num . ']+/'; // as above but numbers can by of any length
$number1 = 4521;
$match = preg_match($my_pattern, $number1); // will return TRUE
$number2 = 2222444111;
$match2 = preg_match($my_pattern2, $number2); // will return TRUE
$number3 = 888;
$match3 = preg_match($my_pattern, $number3); // will return FALSE
$match4 = preg_match($my_pattern2, $number3); // will return FALSE
Something similar will work in Perl as well.
Regular expressions are not appropriate for this purpose. Here is a Perl script:
#/usr/bin/perl
use strict;
use warnings;
my $src = '2345';
my #test = qw( 3245 5432 5542 1234 12345 );
my $canonical = canonicalize( $src );
for my $candidate ( #test ) {
next unless $canonical eq canonicalize( $candidate );
print "$src and $candidate consist of the same digits\n";
}
sub canonicalize { join '', sort split //, $_[0] }
Output:
C:\Temp> ks
2345 and 3245 consist of the same digits
2345 and 5432 consist of the same digits