Regex: match patterns - regex

So I have a string of numbers. There are certain rules I need.
It can have tel1 or tel2 at start (or not)
If E is in number it must be foolowed by 4 digits then followed by 49. (Optional pattern)
So a string like:
tel1: +E1234498912345678,tel2: +498912345678,tel1: +E123449D1238912345678,tel2: +E1234498912345678
is valid
tel1: +E12344598912345678,tel2: +498912345678,tel1: +E123449D1238912345678,tel2: +E1234498912345678
is invalid (first element invalid)
And also each element must begin with + like in examples
UPDATE: Also needs to match numbers with '#' suffix

This will work:
^((\s*tel[12]:\s*)?\+(E\d{4}49|\d)[^,]*(,|$))+$
Try the demo here.

^((tel1|tel2)?(:\s*)?\+(E)?\d{4}49\w+(,|$)|(tel1|tel2)?(:\s*)?\+(?!E)\w+(,|$))+$
You can try this.See demo.
http://regex101.com/r/iO1uK1/3

You could try the below regex,
^\s*(?:(?:tel[12]):\s*\+(?:E(?=\d{4}49)\S+?\b|\d+))(?:,(?:(?:tel[12]):\s*\+(?:E(?=\d{4}49)\S+?\b|\d+)))+$
DEMO

Demo : http://regex101.com/r/oG6lH3/1
(tel\d: |)\+(E\d\d\d\d49|49)
This will also match tel3 or tel9. If that is an issue, use one of the other provided answers.

Related

Return dash followed by a single character

This works as expected:
([^\u0000-\u007F])+-हा([^\u0000-\u007F])+
Returns:
ब-हाणपूर
ब-हाणी
बनियन-हाफ
But I am looking for 1 character followed by dash. The expected output is:
ब-हाणपूर
ब-हाणी
I tried to replace + sign with character count like this...
([^\u0000-\u007F]){1}-हा([^\u0000-\u007F])+
But it returned the same 3 results. How do I return the first 2?
You need anchors:
^([^\u0000-\u007F])-हा([^\u0000-\u007F])+$
Demo
You asked 'What if I need 5 characters to the left of dash?'
The regex portion [^\u0000-\u007F] as written matches a single character that meets that criterion. If you want more or less than one, use a regex quantifier to describe how many you want.
In this case, if you want 5, you would use:
^([^\u0000-\u007F]{5})-हा([^\u0000-\u007F])+$
Probably like this:
^([^\u0000-\u007F]){1}-हा([^\u0000-\u007F])+
^([^\u0000-\u007F]{1})-हा([^\u0000-\u007F]+)
(\b[^\u0000-\u007F]{1})-हा([^\u0000-\u007F]+)
Regex demo

Python Regex - How to extract the third portion?

My input is of this format: (xxx)yyyy(zz)(eee)fff where {x,y,z,e,f} are all numbers. But fff is optional though.
Input: x = (123)4567(89)(660)
Expected output: Only the eeepart i.e. the number inside 3rd "()" i.e. 660 in my example.
I am able to achieve this so far:
re.search("\((\d*)\)", x).group()
Output: (123)
Expected: (660)
I am surely missing something fundamental. Please advise.
Edit 1: Just added fff to the input data format.
You could find all those matches that have round braces (), and print the third match with findall
import re
n = "(123)4567(89)(660)999"
r = re.findall("\(\d*\)", n)
print(r[2])
Output:
(660)
The (eee) part is identical to the (xxx) part in your regex. If you don't provide an anchor, or some sequencing requirement, then an unanchored search will match the first thing it finds, which is (xxx) in your case.
If you know the (eee) always appears at the end of the string, you could append an "at-end" anchor ($) to force the match at the end. Or perhaps you could append a following character, like a space or comma or something.
Otherwise, you might do well to match the other parts of the pattern and not capture them:
pattern = r'[0-9()]{13}\((\d{3})\)'
If you want to get the third group of numbers in brackets, you need to skip the first two groups which you can do with a repeating non-capturing group which looks for a set of digits enclosed in () followed by some number of non ( characters:
x = '(123)4567(89)(660)'
print(re.search("(?:\(\d+\)[^(]*){2}(\(\d+\))", x).group(1))
Output:
(660)
Demo on rextester

Regex to get specific numbers with three digits

Im trying to get a string to match this pattern:
C006,
C007,
C008,
C009,
C010,
C011
I have this:
C00[6-9]|1[0-1]
And it works with "C006" to "C009", but when I got "C010" or "C011" the regex match only with the number 10 or 11.
Tested on http://rubular.com/r/gFKJ2eTyrz
Can anyone help-me with this?
Thankss.
Your regex is
C00[6-9]
or
1[0-1]
https://regex101.com/r/CsjWzT/3
You need to group the alternative patterns. Try:
C0(0[6-9]|1[0-1])
Demo: https://regex101.com/r/CsjWzT/1
If you want it exact use anchors:
^C0(0[6-9]|1[0-1])$
https://regex101.com/r/CsjWzT/2
Try this: C(\d){3}
C -> matches the char 'C'
\d -> matches any number
{3} -> only three digits

How can I match this string '-,-,-,9,-'?

I was told to validate the string like this -,-,-,9,-
It was separated by , and contains 1 number(0-9), others are all -
some examples:
9,-,-,-,-
-,-,-,-,9
-,-,2,-,-
How can I match this? And what concepts should I learn in regex?
Update
I miss the times, sorry, this string can only contains 5 part,so the length can be only 9,it means a string like below should not be passed:
-,-,9,-,-,-
and of course, it should have only one number.
^(?=\D*\d\D*$)[0-9-](?:,[0-9-]){4}$
You can try this.See demo.
https://regex101.com/r/nM7nT5/5
This ensures that the string must hav atleast one comma and exactly one digit.
^(?:\d(?:,-)+|-(?:,-)*,\d(?:,-)*)$
DEMO
OR
^(?=\D*(?:^|,)\d(?:,|$)\D*$)[\d-](?:,[\d-])+$
DEMO

Why the * regular expression indicates what can or cannot be it's previous character

Take this for an example which I found in some blog,
"How about searching for apple word which was spelled wrong in a given file where apple is misspelled as ale, aple, appple, apppple, apppppple etc. To find all patterns
grep 'ap*le' filename
Readers should observe that the above pattern will match even ale word as * indicates 0 or more of previous character occurrence."
Now it's saying that "ale" will be accept when we are having ap*le, isn't the "ap" and "le" fixed?
The * is a quantifier meaning 0 or more times for the previous pattern -- in this case a single literal p. You can also state the same as * with a quantifier:
ap{0,}le
The interesting question sometimes is 'what is the previous pattern?' It is often helpful to put a pattern in a group to aid understand of what the 'previous pattern' is.
Consider wanting to find any of:
ale, aple, appple, apppple, apppppple, able, abbbbbbble
Your first try might be:
/ap|b*le/
^ literal 'p' is the first alternative #WRONG regex will use 'ap'
^ or
^ literal 'b'
Demo
What you want in this case is:
/a(?:p|b)*le/
Demo
If you do not want to match ale and only match aple, appple, apppple, apppppple, use the + instead of the * which means one or more:
/ap+le/
And is equivalent to /ap{1,}le/
Demo
And if you want to only match aple, appple and leave out the variants with more than 3 'p's use the additional max quantifier:
/ap{1,3}le/
All the variants above will match apple correctly spelled. If you what only aple, appple, and not match apple, use alteration:
/a(?:p|p{3})le/
Demo
No its not.
"*" in your case means zero or any occurrence of p. While a and le is fixed. If you need fixed ap and le then this is what you need:
ap+le
"+" means at least once but no limit on number of occurrences.
This means now any number of p after a but before l. So it wont select ale now.