Regex with little complex situation (If- Else) - regex

My pattern is going to be
'#' + '6 char' + '2 numeric' + 'optional value on the basis of numeric value' + '3 char'
Consider below for optional values
if '01' then '6 char hexa decimal'
if '02' then '1 char hexa decimal'
if 'other' then NO VALUE
So Possible value could be:
#DEEPAK0100FFBASHA when i break (# DEEPAK 01 00FFBA SHA)
#DEEPAK02AXYZ when i break (# DEEPAK 02 A XYZ)
#SHARMA99XYZ when i break (# SHARMA 99 XYZ)
How to write the regex to get it to break the given string correspondingly?

You may try the below anchored regex.
#"^#[A-Z]{6}(?:01[\dA-F]{6}|02[\dA-F]|(?:(?!0[12])\d){2})[A-Z]{3}$"
DEMO

Related

phone number search postgres

Suppose I have the following data in the "people" table
id name phone_number
1 Pete +651234-5678
2 John 1234 56 78 Main number
If I had a search string of "123456789" how would I extract the rows that match the phone number - I assume some regex is required but what is the best approach to do this in Postgres?
Looks like this will work
x="SELECT id FROM people WHERE regexp_replace(phone_number, '[^0-9]', '', 'g') = '12345678';"
or perhaps
x="SELECT id FROM people WHERE regexp_replace(phone_number, '[^0-9]', '', 'g') LIKE '12345678';"
Is this the best way?
Your original search string would not be found of course but I think mine is the complete answer - only votes will tell??
select *, regexp_replace(phone_number, '[^0-9]', '', 'g')
from people
where position ( '12345678' in regexp_replace(phone_number, '[^0-9]', '', 'g')) >= 0 ;
You could use the following query:
SELECT * FROM people WHERE phone_number ~ '^1234\s*56\s*78.*'
The pattern \s* will match zero or more spaces. The regex will match any of the following:
1234 56 78 Main number
123456 78 some text
12345678

Match phone numbers with lengths between 8-16 digits, ignoring ()+-

Consider the following:
+12 34 456 432
(12) 34 567 124
1234 56 78 90
(1234) 567 890
1234-567-890
1234 - 567 - 890
12 34 56 78
12-34-56-78
Assume these are all valid phone number structures
Can a regex be used to express: find at least 8 numbers,but not more than 16 and ignore spaces, round brackets, the plus symbol(once) and the minus.
My current working sample is a mess:
^([\+|\(]{1,2})?+(\d{2,4})+([ |-|\)]{1,2})?+(\d{2,3})+([ |-]{1})?+(\d{2,3})+([ |-]{1})?+(\d{2,3})?$
Even if phone number validation is recommended against. Is there not a simpler regex syntax for these things?
To just account for the number of digits and ingore the -, ), ( or spaces (allowing a + at the beginning), you can use the following regex:
^\+?(?:[ ()-]*\d){8,16}$
It matches
^ - start of string
\+? - one or zero +
(?:[ ()-]*\d){8,16} - 8 to 16 sequences of...
[ ()-]* - 0 or more -, ), ( or a space characters
\d - a digit
$ - end of string
See the regex demo
This may ease your task.
First, remove everything that is not a number:
myString = myString.replace(/\D/g,'');
You'll get this:
1234456432
1234567124
1234567890
1234567890
1234567890
1234567890
12345678
12345678
Then just check for length:
if(myString.length >= 0 && myString.length <=16)
// Do stuff
Using preg_replace fetch numbers only, check for the valid length
<?php
$ph = "(12) 34 567 124";
$len = strlen(preg_replace('/[^0-9]+/', '', $ph));
if($len >=8 && $len <=16)
echo "Valid";
else
echo "Invalid";
Don't even think about it. Phone numbers are complicated. They are hugely complicated. Google has a decent library to handle phone numbers named libPhoneNumber.
And excuse me, but ignoring the "+" makes whatever you are doing totally, absolutely wrong. A plus is followed by the country code of some country, followed by a local phone number within that country (which needs to be parsed according to the rules of that country, and there are about 200). Without the "+", you have a phone number according to the local rules, and you need to find out which local rules apply. Which means your number can start with a code for dialing a foreign exchange instead of the "+", otherwise it is formatted according to local rules.
As a result, a number may be valid with the "+" and invalid without it or vice versa, and most likely refers to a different actual phone in totally different countries with or without the "+".

Find all substrings with at least one group

I try to find in a string all substring that meet the condition.
Let's say we've got string:
s = 'some text 1a 2a 3 xx sometext 1b yyy some text 2b.'
I need to apply search pattern {(one (group of words), two (another group of words), three (another group of words)), word}. First three positions are optional, but there should be at least one of them. If so, I need a word after them.
Output should be:
2a 1a 3 xx
1b yyy
2b
I wrote this expression:
find_it = re.compile(r"((?P<one>\b1a\s|\b1b\s)|" +
r"(?P<two>\b2a\s|\b2b\s)|" +
r"(?P<three>\b3\s|\b3b\s))+" +
r"(?P<word>\w+)?")
Every group contain set or different words (not 1a, 1b). And I can't mix them into one group. It should be None if group is empty. Obviously the result is wrong.
find_it.findall(s)
> 2a 1a 2a 3 xx
> 1b 1b yyy
I am grateful for your help!
You can use following regex :
>>> reg=re.compile('((?:(?:[12][ab]|3b?)\s?)+(?:\w+|\.))')
>>> reg.findall(s)
['1a 2a 3 xx', '1b yyy', '2b.']
Here I just concise your regex by using character class and modifier ?.The following regex is contain 2 part :
[12][ab]|3b?
[12][ab] will match 1a,1b,2a,2b and 3b? will match 3b and 3.
And if you don't want the dot at the end of 2b you can use following regex using a positive look ahead that is more general than preceding regex (because making \s optional is not a good idea in first group):
>>> reg=re.compile('((?:(?:[12][ab]|3b?)\s)+\w+|(?:(?:[12][ab]|3b?))+(?=\.|$))')
>>> reg.findall(s)
['1a 2a 3 xx', '1b yyy', '2b']
Also if your numbers and example substrings are just instances you can use [0-9][a-z] as a general regex :
>>> reg=re.compile('((?:[0-9][a-z]?\s)+\w+|(?:[0-9][a-z]?)+(?=\.|$))')
>>> reg.findall(s)
['1a 2a 3 xx', '1b yyy', '5h 9 7y examole', '2b']

Regex expression if number string contains specific numbers

I need some help with creating a regex string. I have this long list of numbers:
7001 7002 7003 7004 7005 7006 7007 7008 7009 7010 7011 7012 7013 7014
7015 7016 7017 7018 7019 7020 7021 7022 7023 7024 7025 7026 7027 7028
7029 7030 7031 7032 7033 7034 7035 7036 7037 7038 7039 7040 7041 7042
7043 7044 7045 7046 7047 7048 7049 7050 7051 7052 7053 7054 7055 7056
7057 7058 7059 7060 7061 7062 7063 7064 7065 7066 7067 7068 7069 7070
7071 7072 7073 7074 7075 7076 7077 7078 7079 7080 7081 7082 7083 7084
7085 7086 7087 7088 7089 7090 7091 7092 7093 7094 7095 7096 7097 7098
7099 7100 7101 7102 7103 7104 7105 7106 7107 7108 7109 7110 7111 7112
7113 7114 7115 7116 7117 7118 7119 7120 7121 7122 7123 7124 7125 7126
7127 7128 7129 7130 7131 7132 7133 7134 7135 7136 7137 7138 7139 7140
7141 7142 7143 7144 7145 7146 7147 7148 7149 7150 7151 7152 7153 7154
7155 7156 7157 7158 7159 7160 7161 7162 7163 7164 7165 7166 7167 7168
7169 7170 7171 7172 7173 7174 7175 7176 7177
Basically, I need to find the numbers that contain numbers 8 and 9 so I can remove them from the list.
I tried this regex: ([0-7][0-7][8-9]{2}) but that will only match numbers that strictly have both numbers 8 & 9.
How about you just write some simple code rather than trying to cram everything into a regex?
#!/usr/bin/perl -i -p # Process the file in place
#n = split / /; # Split on whitespace into array #n
#n = grep { !/[89]/ } #n; # #n now contains only those numbers NOT containing 8 or 9
$_ = join( ' ', #n ); # Rebuild the line
Dalorzo answer would work, but I suggest a different approach:
/\b(?=\d{4}\b)(\d*[89]\d*)\b/g
Assuming you are only looking for 4 digit numbers, then it is using a positive lookahead to ensure you have those (so it won't match, say, 3 or 5 digit numbers) and then checks if at least one of the digits is 8 or 9.
http://regex101.com/r/hW4vQ3
If you need to catch all numbers, not just four digit ones, then
/\b(?=\d+\b)(\d*[89]\d*)\b/g
See it in action:
http://regex101.com/r/bW2gH3
And as a bonus, the regex is also capturing the numbers so you can do a replace afterwards, if you wish
This is a bit long-winded, but easier to decipher:
/\b([89]\d{3}|\d[89]\d{2}|\d{2}[89]\d|\d{3}[89])\b/g
It also restricts the search to 4-digit groups.
How about:
/\b((?:[\d]+)?[89](?:[\d]+)?)\b/g
Online Demo
\b will match the end and the begging of each number.
(?:[\d]+)? a non matching group of numbers, we need optional at the begging [89] and ending [89] and containing [89].
?: The non-matching group may be optional in this expression but there was not need to match the sub-groups.
You can use this pattern:
[0-7]*(?:8[0-8]*9|9[0-9]*8)[0-9]*
or with a backreference:
(?:[0-9]*(?!\1)([89])){2}[0-9]*
re.findall(r"(\d\d[0-7][89])|(\d\d[89][0-7])|(\d\d[89][89])",x)
Works for the input given.
Slightly simpler regex with lookahead:
(?=\d*[89])\d+
Demo

How can I access capture buffers in brackets with quantifiers?

How can I access capture buffers in brackets with quantifiers?
#!/usr/local/bin/perl
use warnings;
use 5.014;
my $string = '12 34 56 78 90';
say $string =~ s/(?:(\S+)\s){2}/$1,$2,/r;
# Use of uninitialized value $2 in concatenation (.) or string at ./so.pl line 7.
# 34,,56 78 90
With #LAST_MATCH_START and #LAST_MATCH_END it works*, but the line gets too long.
Doesn't work, look at TLP's answer.
*The proof of the pudding is in the eating isn't always right.
say $string =~ s/(?:(\S+)\s){2}/substr( $string, $-[0], length($-[0]-$+[0]) ) . ',' . substr( $string, $-[1], length($-[1]-$+[1]) ) . ','/re;
# 12,34,56 78 90
You can't access all previous values of the first capturing group, only the last value (or the current at the match end, as you can see it) will be saved in $1 (unless you want to use a (?{ code }) hack).
For your example you could use something like:
s/(\S+)\s+(\S+)\s+/$1,$2,/
The statement that you say "works" has a bug in it.
length($-[0]-$+[0])
Will always return the length of the negative length of your regex match. The numbers $-[0] and $+[0] are the offset of the start and end of the first match in the string, respectively. Since the match is three characters long (in this case), the start minus end offset will always be -3, and length(-3) will always be 2.
So, what you are doing is taking the first two characters of the match 12 34, and the first two characters of the match 34 and concatenating them with a comma in the middle. It works by coincidence, not because of capture groups.
It sounds as though you are asking us to solve the problems you have with your solution, rather than asking us about the main problem.