Regex pattern for validation where 2 sections vary in length interdependently - regex

I'm trying to workout a Regex pattern for validating a string that consists of 2 parts that vary in length but the overall length remains the same.
Overall length = 7
start section alpha characters only 1-3 characters
end section 4-6 digits
combinations 1 Alpha + 6 digits or 2 Alpha + 5 digits or 3 Alpha + 4 digits.
In the second and third option the first character is allowed to be a space.
What I have so far is ^(?:([\sA-Z][A-Z]{2})(\d{4})|[\sA-Z]A-Z|A-Z)$
Can that be simplified?
How can I have and optional Alpha character at the end?

You need a look ahead to assert the overall length and a negative look ahead to prevent the "space digit" start:
^(?=.{7}$)(?! \d) ?[a-zA-Z]{1,3}\d{4,6}$
See a live demo with several edge cases.

This might work
# (?i)^(?=.{7}$)(?:[a-z]{1,3}|[ ][a-z]{2,3})\d{4,6}[a-z]?$
(?i) # Case independent
^ # BOL
(?= .{7} $ ) # 7 chars total
(?:
[a-z]{1,3} # 1 to 3 alpha
|
[ ] [a-z]{2,3} # or, space plus 2 to 3 alpha
)
\d{4,6} # 4 to 6 digits
[a-z]? # optional alpha char
$ # EOL

Related

Checking min no of characters in capturing group in Regex

I have a question in regex
I am dealing with numbers 0 and 1 only
I have 10 digit number grouped into 4 as below
([01]{2})([01]{4})([01]{2})([01]{2})
I need to match all those numbers with min 2 1's in the second group which is ([01]{4}) , no matter how many 0's or 1's other groups are having. I am interested only in the second group
For example, these are the potential matches are
0000110000
0011000000
0001100000
0000110000
I tried using positive look ahead like :
^(\d{2})((?=\d*1{2,}\d*)(\d{4}))(\d{2})(\d{2})
but this is matching even
0000000011
Any help is deeply appreciated
If the two 1s are not necessarily consecutive in Group 2, you can use
^([01]{2})(?=(?:[01]*1){2}[01]{4,6}$)([01]{4})([01]{2})([01]{2})$
See the regex demo
Details:
^ - start of string
([01]{2}) - Group 1: two occurrences of 1 or 0
(?=(?:[01]*1){2}[01]{4,6}$) - immediately to the right of the current location, there must be two occurrences of any zero or more 0 or 1 chars followed with 1 and then there must be four, five or six 1 or 0 chars till the end of string
([01]{4}) - Group 2: four occurrences of 1 or 0
([01]{2}) - Group 3: two occurrences of 1 or 0
([01]{2}) - Group 4: two occurrences of 1 or 0
$ - end of string.
If the ones need to be consecutive (as per your sample data), maybe you can use:
^(?=[01]{2,4}11)[01]{10}$
See the online demo. The idea here is that you would match 2-4 zero's or 1's upto a sequence of two ones. It makes sense if you realise the only combinations that are allowed would have the minimum of two 1's ("11") sequence after exactly 2-4 other digits.
^ - Start line anchor.
(?=[01]{2,4}11) - Open positive lookahead to look for 2-4 characters from our characters class upto "11".
[01]{10} - Match exactly 10 characters from our character class.
$ - End line anchor.
If need be you can change the [01]{10} pieces where you'd use capture groups.
EDIT:
If they don't have to be consecutive, maybe you can work with:
^[01]{2}(?=[01]{8}$)([01]{0,2}1[01]{0,2}1[01]{0,2})[01]{4}$
See the online demo.
Or less verbose:
^(?=[01]{10}$)(..)(.*1.*1.*)(..)(..)$
See the demo
Not a job for regex but for bitwise operators:
(in PHP):
$nums = [
'0000110000',
'0011000000',
'0001100000',
'1000110000',
'0000000110',
'0001000000'
];
foreach ($nums as $num) {
if ( !in_array((bindec($num) >> 4) & 15, [0, 1, 2, 4, 8]) )
echo $num, PHP_EOL;
}
You can probably do that in any language.
If a positive lookahead is supported, you could also assert that group 2 has as least 11 using a positive lookahead.
^([01]{2})(?=[01]{0,2}11)([01]{4})([01]{2})([01]{2})$
^ Start of string
([01]{2}) - Group 1: two occurrences of 1 or 0
(?= Positive lookahead
[01]{0,2}11 Match 0-2 times either 0 or 1 and match 11
) Close lookahead
([01]{4}) - Group 2: four occurrences of 1 or 0
([01]{2}) - Group 3: two occurrences of 1 or 0
([01]{2}) - Group 4: two occurrences of 1 or 0
$ - end of string.
Regex demo
Or you can write out all 3 alternatives matching 11
^([01]{2})(11[01][01]|[01]11[01]|[01][01]11)([01]{2})([01]{2})$
Regex demo

regex input validation for mobile number 03025498448 using C#

I am trying to do regex validation for 11 digit mobile number of type 03025398448.Where first 3 digits are constant 030 and remaining 8 digits are from 0 to 9 (any number) and 1st digit could be written in +92 format .So, help me for this number regex code
If the number should start with 030 and +92 is optional and when using +92 you should omit the leading zero, you could use:
^(?:\+9230|030)?\d{8}$
Explanation
^ # From the beginning of the string
(?: # Non capturing group
\+9230|030 # Match +9230 or 030
)? # close capturing group and make it optional
\d{8} # Match 8 digits
$ # The end of the string
In C# you could use this as string pattern = #"^(?:\+9230|030)?\d{8}$";
C# code
You can use this regular expression:
^((\+?92)30[0-9]{8}|030[0-9]{8})$
Explanation
BeginOfLine
CapturingGroup
GroupNumber:1
OR: match either of the followings
Sequence: match all of the followings in order
CapturingGroup
GroupNumber:2
Sequence: match all of the followings in order
Repeat
+
optional
9 2
3 0
Repeat
AnyCharIn[ 0 to 9]
8 times
Sequence: match all of the followings in order
0 3 0
Repeat
AnyCharIn[ 0 to 9]
8 times
EndOfLine

RegEx in Notepad++ w/Find and replace

I have data that looks like this:
1 ,11/10/2015, 1 3
2 ,01/15/2013
3 ,04/10/2015, 5 5
4 ,04/01/2013, 165
5 ,07/01/2016, 311 312
I need to find every instance that looks like lines 1, 3, and 5 and replace the white space in between the 2 sets of digits with a comma so they become like:
1 ,11/10/2015, 1,3
2 ,01/15/2013
3 ,04/10/2015, 5,5
4 ,04/01/2013, 165
5 ,07/01/2016, 311,312
I'm close with this:
[^(^\d{1,3})][[^(\d{1,3})]\s+(\d{1,3})\r
, but it's keeping the 2 sets of digits AND the white space. Need to isolate the finds to just the white space in between the 2 sets of digits. The leading numbers (1-5) are not in my data set. Just included these for readability here.
If there is only one whitespace-separated digit pair per line, you may use
(\d+)\h+(\d+)
and replace with $1,$2.
If you need to define some more context and make the regex replacement safer, consider
,\h*\K(\d+)\h+(\d+)$
Details:
, - a comma
\h* - 0+ horizontal whitespaces
\K - omit all the text matched so far
(\d+) - Group 1: one or more digits
\h+ - 1+ horizontal whitespaces
(\d+) - Group 2: one or more digits
$ - end of line.

Regex: find a variable number sequence

I am looking for a specific sequence of numbers in a line. I can best explain with an example:
00001 # first search criteria - line 1
00010 # second search criteria - line 2
So every line has 5 digits of either 0 or 1. I am looking for the combination of all 0 except for 1 digit that can be a 1. This 1 can be in any position of the 5 digits.
The regex code I have for 5 digits of 0 is
^((0\s*?){5}) # there may be spaces between the numbers
The line 1 case above would be selected with following regex code:
^((0\s*?){4})\s*(1)
My question is how I could write in regex code the changing position of 1 to cover the 5 cases/positions.
Thank you.
You can use a lookahead based regex for this:
^(?=[0\s]*1[0\s]*$)(?:\s*[01]\s*){5}$
RegEx Demo
Lookahead (?=[0\s]*1[0\s]*$) will enforce only single 1 at any position in input where as (?:\s*[01]\s*){5} will make sure that input has only 0 and 1 with 5 digits length also allowing white-spaces anywhere.
You can use two conditionals.
First one insures 1 is not found again.
Second one insures 1 is found.
(?:((?(1)(?!))1)|0){5}(?(1)|(?!))
Expanded
(?:
( # (1 start)
(?(1) (?!) ) 1
) # (1 end)
| 0
){5}
(?(1) | (?!) )

Regex to Extract Last Part of URL that Contains User ID Strings

I'm having a hard time figuring this one out and could use some help.
I'm using Google Analytics filters to reduce the number of unique pages being reported in our app by stripping out ID strings from the URLs that are coming in.
What I need is a regex that will look for URLs that have these IDs in the URL. Here's what sets them apart from the rest of the URL:
ID strings are always the last part of the URL
ID strings always contain both letters and numbers
ID strings are always either 16- or 32-characters in length
ID strings can show up twice in a URL
ID strings can end with either a "/" or without
Here are some example URLs that show how they appear in our reporting:
/app/6be031b9672be9b5/
/app/admin/client/settings/6be031b9672be9b5
/app/subscribers/ea33fb38c9efc4dc0367819f23434f99/
/app/subscribers/customfieldsettings/0359c487066727ae/
/app/reports/6fa92d36be0e6c16/dc5aa096fba9cbb97eea1dae616d4b3c/
The second part of my question is that this regex should also group everything before these ID strings into a capturing group so that I can call that group later on in the filter, effectively stripping out these ID strings to look like the following:
/app/6be031b9672be9b5/ --> /app/
/app/subscribers/ea33fb38c9efc4dc0367819f23434f99/ --> /app/subscribers/
etc.
I've tried a couple different approaches but none seem to work perfectly, so I could really use the help, thank you!
Here's a solution:
^(.*?)(?:\/[a-zA-Z0-9]{16}|\/[a-zA-Z0-9]{32}){0,2}\/?$
Demo
This will remove the last part or 2 parts of URLs which are 16 or 32 characters long and contain only letters and digits.
You can make sure these parts contain both letters and numbers like this, if the tool supports lookaheads:
^(.*?)(?:\/(?=.{0,15}?\d)(?=.{0,15}?[a-zA-Z])[a-zA-Z0-9]{16}|\/(?=.{0,31}?\d)(?=.{0,31}?[a-zA-Z])[a-zA-Z0-9]{32}){0,2}\/?$
Demo
This adds assertions to the pattern.
Breakdown:
^(.*?) # Start of URL
(?:
\/ # a slash
(?=.{0,15}?\d) # check there's a digit at most 16 chars ahead
(?=.{0,15}?[a-zA-Z]) # check there's a letter at most 16 chars ahead
[a-zA-Z0-9]{16} # check the next 16 chars are digits or letters
| # .. or:
\/ # a slash
(?=.{0,31}?\d) # check there's a digit at most 32 chars ahead
(?=.{0,31}?[a-zA-Z]) # check there's a letter at most 32 chars ahead
[a-zA-Z0-9]{32} # check the next 32 chars are digits or letters
){0,2} # .. at most 2 times
\/?$ # optional slash at end
This will do it:
([a-z0-9]+)(?:\/?$)
Demo
Explanation:
([a-z0-9]+) matches and captures the alphanumeric part
(?:\/?$) looks for (but doesn't match or capture) the optional final / and then the end of the string ($)
modified - totally missed that can be 1 or 2 id's at the end thing.
Oh well, revised fwiw.
# (?i)^(.*?)/((?:(?=[^/]{0,31}[a-f])(?=[^/]{0,31}[0-9])(?:[a-f0-9]{16}|[a-f0-9]{32})(?:(?:/[a-z])?/?$|/)){1,2})$
(?i) # Case insensitive modifier
^ # BOS, begin the ride ..
( .*? ) # (1), Kreep up on the first ID
/ # Trim this / junk
( # (2 start), 1-2 ID's separated by a /
(?:
(?= [^/]{0,31} [a-f] ) # Use largest range (32), Must be a lettr AND number
(?= [^/]{0,31} [0-9] )
(?: # One of 16 or 32 length
[a-f0-9]{16}
| [a-f0-9]{32}
)
(?:
(?: / [a-z] )? # optional / letter
/? $ # /? EOS for end of 1 or 2
| # or,
/ # / between 2 only
)
){1,2}
) # (2 end)
$ # EOS, rides over !!
Sample output:
** Grp 0 - ( pos 195 , len 63 )
/app/reports/6fa92d36be0e6c16/dc5aa096fba9cbb97eea1dae616d4b3c/
** Grp 1 - ( pos 195 , len 12 )
/app/reports
** Grp 2 - ( pos 208 , len 50 )
6fa92d36be0e6c16/dc5aa096fba9cbb97eea1dae616d4b3c/