Regular expression to capture first n digits from comma separated strings - regex

I quickly found a way to get a working multi-line regular expression for my needs, but having trouble with its conversion into a single line.
So, consider this input with regex /^[2-9]\d{1}(?:\s){0}/gm applied:
4126-54D429-001,
5149-A42102-002,
9251-Z48910-003
...
However, when I turn it to one line, I'm getting only first two digits in ouput:
4126-54D429-001, 5149-A42102-002, 9251-Z48910-003 ...
How can this regexp be written to get this capture:
4126-54D429-001, 5149-A42102-002, 9251-Z48910-003 ... ?

This Should Work.
REGEXP
\b\d{2}(?=\d{2})
INPUT
4126-54D429-001, 5149-A42102-002, 9251-Z48910-003, 7851-Z48910-003
OUTPUT
41
51
92
78
The comma is not essential
If i help u, mark me as correct and vote up

This will capture the first two digits of each in groups:
(\d{2})[^,]*

Related

Regular Expression Extracting Text from a group

I have a filename like this:
0296005_PH3843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
I needed to break down the name into groups which are separated by a underscore. Which I did like this:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
So far so go.
Now I need to extract characters from one of the group for example in group 2 I need the first 3 and 8 decimal ( keep mind they could be characters too ).
So I had try something like this :
(.*?)_([38]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It didn’t work but if I do this:
(.*?)_([PH]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It will pull the PH into a group but not the 38 ? So I’m lost at this point.
Any help would be great
Try the below Regex to match any first 3 char/decimal and one decimal
(.?)_([A-Z0-9]{3}[0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
Try the below Regex to match any first 3 char/decimal and one decimal/char
(.?)_([A-Z0-9]{3}[A-Z0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
It will match any 3 letters/digits followed by 1 letter/digit.
If your first two letter is a constant like "PH" then try the below
(.?)_([PH]+[0-9A-Z]{2})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
I am assuming that you are trying to match group2 starting with numbers. If that is the case then you have change the source string such as
0296005_383843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
It works, check it out at https://regex101.com/r/zem3vt/1
Using [^_]* performs much better in your case than .*? since it doesn't backtrack. So changing your original regex from:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
to:
([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
reduces the number of steps from 114 to 42 for your given string.
The best method might be to actually split your string on _ and then test the second element to see if it contains 38. Since you haven't specified a language, I can't help to show how in your language, but most languages employ a contains or indexOf method that can be used to determine whether or not a substring exists in a string.
Using regex alone, however, this can be accomplished using the following regular expression.
See regex in use here
Ensuring 38 exists in the second part:
([^_]*)_([^_]*38[^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
Capturing the 38 in the second part:
([^_]*)_([^_]*)(38)([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)

Regular expression not containing 101

I came across the regular expression not containing 101 as follows:
0∗1∗0∗+(1+00+000)∗+(0+1+0+)∗
I was unable to understand how the author come up with this regex. So I just thought of string which did not contain 101:
01000100
I seems that above string will not be matched by above regex. But I was unsure. So tried translating to equivalent pcre regex on regex101.com, but failed there too (as it can be seen my regex does not even matches string containing single 1.
Whats wrong with my translation? Is above regex indeed correct? If not what will be the correct regex?
Here is a bit shorter expression ^0*(1|00+)*0*$
https://www.regex101.com/r/gG3wP5/1
Explanation:
(1|00+)* we can mix zeroes and ones as long as zeroes occur in groups
^0*...0*$ we can have as many zeroes as we want in prefix/suffix
Direct translation of the original regexp would be like
^(0*1*0*|(1|00|000)*|(0+1+0+)*)$
Update
This seems like artificially complicated version of the above regexp:
(1|00|000)* is the same as (1|00+)*
it is almost the solution, but it does not match strings 0, 01.., and ..10
0*1*0* doesn't match strings with 101 inside, but matches 0 and some of 01.., and ..10
we still need to match those of 01.., and ..10 which have 0 & 1 mixed inside, e.g. 01001.. or ..10010
(0+1+0+)* matches some of the remaining cases but there are still some valid strings unmatched
e.g. 10010 is the shortest string that is not matched by all of the cases.
So, this solution is overly complicated and not complete.
read the explanation in the right side tab in regex101 it tells you what your regex does( I think you misunderstood what list operator does) , inside a list operator ( [ ) , the other characters such as ( won't be metacharacters anymore so the expression [(0*1*0*)[1(00)(000)] will be equivalent to [01()*[] which means it matches 0 or 1 or ( or ) or [
The correct translation of the regular expression 0∗1∗0∗+(1+00+000)∗+(0+1+0+)∗
will be as follows:
^((?:0*1*0*)|(?:1|00|000)*|(?:0+1+0+)*)$
regex101
Debuggex Demo
What your regex [(0*1*0*)[1(00)(000)]*(0+1+0+)*] does:
[(0*1*0*)[1(00)(000)]* -> matches any of characters 0,(,),*,[ zero or more times followed by
(0+1+0+)* --> matches the pattern 0+1+0+ 0 or more times followed by
] --> matches the character ]
so you expression is equivalent to
[([)01](0+1+0+)*] which is not a regular expression to match strings that do not contain 101
0* 1* ( (00+000)* 1*)* (ε+0)
i think this expression covers all cases because --
any number apart from 1 can be broken into constituent 2's and 3's i.e. any number n=2*i+3*j. So there can be any number of 0's between 2 consecutive 1's apart from one 0.Hence, 101 cannot be obtained.
ε+0 for expressions ending in one 0.
The RE for language not containing 101 as sub-string can also be written as (0*1*00)*.0*.1*.0*
This may me a smaller one then what you are using. Try to make use of this.
Regular Expression I got (0+10)1. (looks simple :P)
I just considered all cases to make this.
you consider two 1's we have to end up with continuous 1's
case 1: 11111111111111...
case 2: 0000000011111111111111...(once we take two 1's we cant accept 0's so one and only chance is to continue with 1's)
if you consider only one 1 which was followed by 0 So, no issue and after one 1 we can have any number of 0's.
case 3: 00000000 10100100010000100000100000 1111111111
=>(0*+10*)1
final answer (0+10)1.
Thanks for your patience.

Regular expression for secret code

I've created one text field which accepts the product code.
I have tried many ways and got disappointed.
The product code is having some validations like follows,
Product code :315299AZ
1.First 2 digits ranges from[01-31].,should not contain 00.
2.Second 2 digits ranges from [01-52]., should not contain 00.
3.Third 2 digits ranges from [00-99].
4.Last 2 are optional. But should accept only alphabets. Should not accepts numbers.
Please someone help me to get out of it.
You can use the following regex :
(?!00)(([0-2][0-9])|31|30)(?!00)(([0-4][0-9])|51|50|52)(\d{2})([a-zA-Z]{2})?
(?!00) is a negative look-ahead that doesn't allows 00.
Debuggex Demo
There you go:
((0[1-9])|([1-2]\d)|(3[0-1]))((0[1-9])|([1-4]\d)|(5[0-2]))\d{2}([a-zA-Z]{2})?
If you don't like look-aheads.
I know it's not the spirit, but any sensible language supporting regular expressions should allow you to access groups, hence do something along these lines (pseudocode follows):
if product_code matches /^(\d\d)(\d\d)\d\d([a-zA-Z]{2})?$/ {
assert 1 <= int($1) <= 31 // validate first group
assert 1 <= int($2) <= 52 // validate second group
}
Bonus: you can actually read it.
(This is assuming the last optional group contains either two or zero characters. If one character is acceptable, you can replace it with [a-zA-Z]{0,2})

How to match a one of a set of numbers?

I am trying to match a group of numbers in regex that consist of one of the following:
1,2,3,4,5,6,7,8,9,10,11
But I am having trouble figuring out the regex.
For single digits this pattern worked fine "0|1|2|3|4|5|6|7|8|9" but it fails on double digit numbers. For example 12 passes as ok due to the regex finding the 1 in 12.
You can use begin and end anchors to force the whole string to be matched:
^(0|1|2|3|4|5|6|7|8|9|10|11)$
Which can be shortened to:
^(\d|10|11)$
This will work if you want to check if just one number is between 0 and 11.
^[0-9]$|^1?[0-1]$
If you want to match a string like:
1,2,3,12,32,5,1,6,8, 11
and match 0-11 then you can use the following:
(?<=,|^)([0-9]|1?[0-1])(?=,|$)
use this regex ^(0|1|2|3|4|5|6|7|8|9|(10)|(11))$

How do I write a Regular Expression to match any three digit number value?

I'm working with some pretty funky HTML markup that I inherited, and I need to remove the following attributes from about 72 td elements.
sdval="285"
I know I can do this with find/replace in my code editor, except since the value of each attribute is different by 5 degree increments, I can't match them all without a Regular Expression. (FYI I'm using Esspress and it does support RegExes in it's Find/Replace tool)
Only trouble is, I really can't figure out how to write a RegEx for this value. I understand the concept of RegExes, but really don't know how to use them.
So how would I write the following with a Regular Expression in place of the digits so that it would match any three digit value?
sdval="285"
/sdval="\d{3}"/
EDIT:
To answer your comment, \d in regular expressions means match any digit, and the {n} construct means repeat the previous item n times.
Easiest, most portable: [0-9][0-9][0-9]
More "modern": \d{3}
This should do (ignores leading zeros):
[1-9][0-9]{0,2}
import re
data = "719"
data1 = "79"
# This expression will match any single, double or triple digit Number
expression = '[\d]{1,3}'
print(re.search(expression, data).string)
# This expression will match only triple digit Number
expression1 = '[\d]{3}'
print(re.search(expression1, data1).string)
Output :
expression : 719
expression1 : 79
It sounds like you're trying to do a find / replace in Visual Studio of a 3 digit number (references to Express and Find/Replace tool). If that's the case the regex to find a 3 digit number in Visual Studio is the following
<:d:d:d>
Breakdown
The < and > establish a word boundary to make sure we don't match a number subset.
Each :d entry matches a single digit.