I have a regex
\(?\+\(?49?\)?[ ()]?([- ()]?\d[- ()]?){11}
This correctly matches German phone code like
+491739341284
+49 1739341284
(+49) 1739341284
+49 17 39 34 12 84
+49 (1739) 34 12 84
+(49) (1739) 34 12 84
+49 (1739) 34-12-84
but fails to match 0049 (1739) 34-12-84.
I need to adjust the regular expression so that it can match numbers with 0049 as well. can anyone help me with the regex?
Try this one:
\(?\+|0{0,2}\(?49\)?[ ()]*[ \d]+[ ()]*[ -]*\d{2}[ -]*\d{2}[ -]*\d{2}
https://regex101.com/r/CHjNBV/1
However, it's better to make it accept only +49 or 0049, and throw the error message in case the number fails validation. Because if someday you will require to extend the format - it will require making the regex much more complicated.
If you want to match the variations in the question, you might use a pattern like:
^(?:\+?(?:00)?(?:49|\(49\))|\(\+49\))(?: *\(\d{4}\)|(?: ?\d){4})? *\d\d(?:[ -]?\d\d){2}$
Explanation
^ Start of string
(?: Non capture group
\+? Match an optional +
(?:00)? Optionally match 2 zeroes
(?:49|\(49\)) Match 49 or (49)
| Or
\(\+49\) Match (+49)
) Close non capture gruop
(?: Non capture group
* Match optional spaces
\(\d{4}\) Match ( 4 digits and )
| Or
(?: ?\d){4} Repeat 4 times matching an optional space and a digit
)? Close non capture group and make it optional
* Match optional spaces
\d\d Match 2 digits
(?:[ -]?\d\d){2} Repeat 2 times matching either a space or - followed by 2 digits
$ End of string
Regex demo
Or a bit broader variation matching the 49 prefix variants, followed by matching 10 digits allowing optional repetitions of what is in the character class [ ()-]* in between the digits.
^(?:\+?(?:00)?(?:49|\(49\))|\(\+49\))(?:[ ()-]*\d){10}$
Regex demo
Related
we have below file formats
60min-->
A20210217.0300-0000-0400-0000_GBM053.xml.gz
15min -->
A20210217.0300-0000-0315-0000_GBM053.xml.gz ,A20210217.0315-0000-0330-0000_GBM053.xml.gz, A20210217.0330-0000-0345-0000_GBM053.xml.gz , A20210217.0345-0000-0400-0000_GBM053.xml.gz
Tried with below regex but not working
!(^A[0-9]{8}.[0-9]{2}[0]{2}-[0-9]{4}-[0-9]{2}[0]{2}-[0-9]{4}_.*.xml(|\.gz)$)
The ! at the start of the pattern matches a ! literally which is not there in the example data. If it was meant as a delimiter, it should also be at the end.
You could make the second part match either 15, 30 or 45 and use an alternation to those values either in the first or in the third part of the hyphened string.
^A\d{8}\.(?:\d\d(?:[14]5|30)(?:-\d{4}){3}|\d{4}-\d{4}-\d\d(?:[14]5|30)-\d{4})_.*\.xml\.gz$
The pattern matches
^ Start of string
A\d{8}\. Match A and 8 digits followed by a .
(?: Non capture group for the alternation to match either
\d\d(?:[14]5|30) Match 2 digits and either 15 or 45 or 30
(?:-\d{4}){3} Match 3 times - and 4 digits
| Or
\d{4}-\d{4}- Match 2 times 4 digits and -
\d\d(?:[14]5|30)-\d{4} Match 2 digits and either 15 or 45 or 30 followed by 4 digits
) Close non capture groups
_.*\.xml\.gz Match _, 0+ times any char except a newline and .xml.gz
$ End of string
Regex demo
https://regex101.com/r/KqB81T/2
^A\d{8}\.(\d{2}(?:[14]5|30)-0000-\d{4}-0000|\d{4}-0000-\d{2}(?:[14]5|30)-0000)_.*\.xml(|\.gz)$
Break down structure:
First two entries are matched: \d{2}(?:[14]5|30)-0000-\d{4}-0000
Last two entries are matched: \d{4}-0000-\d{2}(?:[14]5|30)-0000
Add matches (UNION between the two SET matches): (FIRST_MATCH|SECOND_MATCH). Also make sure you don't have any character/space at the end (between gz and $)
Let me be the first to say: Welcome to SO, Muskan Garg Bansal!
I am working on a regular expression to match against a hexadecimal string and having some trouble near the end. I am specifically looking for groups of 2 bytes that do not contain 00 that are between 2 and 8 bytes long. I have it all working except that when there are less than 8 bytes, it will allow extra 00 to be in it sometimes.
https://regex101.com/r/jq3QpP/1/
(?!(00)+)([0-9a-fA-F]{2,8})?(?!(00)+) // This on the below text gives the following matches
C86B0200554E0200C86B02000000000000000000270000008109000000000000EC6A050079750
18881000000410000280100000000000000000001000002010400000000000000000000000000
0000000000000000000000F65FA45900000000FF0000002F0000000000000049000000403C9F5
A000000000000000000000000FFFF330000000000000F06EAE8333536
Match 1
Full match 0-8 `C86B0200`
Group 2. 0-8 `C86B0200`
Match 2
Full match 8-16 `554E0200`
Group 2. 8-16 `554E0200`
Match 3
Full match 16-21 `C86B0`
Group 2. 16-21 `C86B0`
Match 4
Full match 21-21 ``
Match 5
Full match 39-47 `02700000`
Group 2. 39-47 `02700000`
In match 1,2,5 there are extra 00, in match 3, it missed the 20 for some reason. If you have an idea what I missed, please let me know
You can avoid matching 00 by allowing only one 0 in two digits at a time instead:
(?:[A-F1-9][A-F0-9]|[A-F0-9][A-F1-9]){1,4}(?=(?:..)*$)
Demo: https://regex101.com/r/2hebvf/2
I've got a document that looks something like this:
# Document ID 8934
# Last updated 2018-05-06
52 84 12 70 23 2 7 20 1 5
4 2 7 81 32 98 2 0 77 6
(..and so on..)
In other words, it starts off with a few comment lines, then the rest of the document is just a bunch of numbers separated by spaces.
I'm trying to write a regex that gets all digits on all lines that don't start with #, but I can't seem to get it.
I've read over answers such as
Regular Expressions: Is there an AND operator?
Regex: Find a character anywhere in a document but only on lines that begin with a specific word
and pawed through sites such as http://regular-expressions.info, but I still can't get an expression that works (the best I can get is a lengthy version of ^[^#].*
So how can I match digits (or text, or whatever) in a string, but only on lines that don't start with a certain character?
Your regex ^[^#].* uses a negated character class which matches not a # from the start of the string ^ and after that matches any character zero or more times.
This would for example also match t test
What you might do is use an alternation to match a whole line ^#.*$ that starts with a # or capture in a group one or more digits (\d+)
Your digits are captured group 1. You could change the (\d+) to for example a character class ([\w+.]+) to match more than only digits.
(?:^#.*$|(\d+))
Details
(?: Non capturing group
^#.*$ Match from the start of the line ^ a # followed by any character zero or more times .* until the end of the string $
| Or
(\d+) capture one or more digits in a group
) Close non capturing group
I think a way simpler method would be to replace the lines with "" first with this regex:
^#.*
And then you can just match all the numbers with this:
-?\d+ (-? is for negative)
i have this text
14 two 25 three 12 four 40 five 10
I want to obtain "14 two 14 25 three 14 25 12 four 14 25 12 40 five 14 25 12 40 10"
For example, when I replace (14 two ) for (14 two 14 ) this start after of 14 I can't start it after two.
Is there any other alternative to do?
For example using a group that is not included in match ( a group before match ) for replace it ?
please help me
This should do the trick for you:
Regex: ((?:\s?\d+\s?)+)((?:[a-zA-Z](?![^a-zA-Z]+\1))+)
Replacement: $1$2 $1
You will need to click on the "replace all" button for this to work (it cannot be done in one shot, it has to be repeated as long as it can find match. Online PHP example)
Explanation:
\s: Match a single space character
?: the previous expression must be matched 0 or 1 time.
\s?: Match a space character 0 or 1 time.
\d: Match a digit character (the equivalent of [0-9]).
+: The previous expression must be matched at least one time (u to infinite).
\d+: Match as much digit characters as you (but at least one time).
(): Capture group
(?:): Non-capturing group
((?:\s?\d+\s?)+): Match an optional space character followed by one or more digit characters followed by an optional space character. The expression is surrounded by a non-capturing group followed by a plus. That mean that the regex will try to match as much combination of space and digit character as it can (so you can end up with something like '14 25 12 40').
The capture group is meant to keep the value to reuse it in the replacement.You cannot simply add the plus at the end of the capture group without the non-capturing group within because it would only remember the last digits capture ('12' instead of the whole '14 25 12' use to build '14 25 12 40').
[a-zA-Z]: Match any English letters in any case (lower, upper).
\1: reference to what have been capture in the first group.
(?!): Negative lookahead.
[^]: Negative character class, so [^a-zA-Z] means match anything
((?:[a-zA-Z](?![^a-zA-Z]+\1))+): The negative lookahead is meant to make sure that we don't always end up matching the first "14 two" in the input text. Without it, we would end up in an infinite loop giving results as "14 two 14 14 14 14 14 14 25 three 12 four 40 five 10" (the "14" before "25" being repeated until you reach the timeout).
Basically, for every English letter we match, we lookahead to assert that the content of the first capture group (by example "14") is not present in our digit sequence.
For the replacement, $1$2 $1 means put the content of the capture group 1 and 2, add a space and put the content of the capture group 1 once more.
I've done some searching but cant find the right regex.
i would like a regex for a text that only contains digits, whitespaces and plus signs.
like: [0-9 +]
But with a min/max limit for only the digits in that text.
My suggestions ended up with something like this:
^[0-9 \+](?=(.*[0-9]){5,8})$
Should be OK:
"123 456 7"
"12345"
"+ 123 456 78"
Should not be ok:
"123456789"
"+ 124 578a"
"+123456789"
Anyone got a solution that might do the trick?
Edit:
I can see that i was to short on my explanation what i'm aiming for.
My regex conditions should be:
Must include between 5-8 digits
Allow whitespaces and plus signs
I'm guessing from your own regex that between 5 and 8 digits in a row without a whitespace in between are allowed. If that's true, than the following regex might do the trick (example written in Python). It allows single digit groups being between 5 and 8 digits long. If there is more than one group, it allows each group to have exactly 3 digits except for the last group which can be between 1 and 3 digits long. One single plus sign on the left is optional.
Are you parsing phone numbers? :)
In [176]: regex = re.compile(r"""
^ # start of string
(?: \+\s )? # optional plus sign followed by whitespace
(?:
(?: \d{3}\s )+ # one or more groups of three digits followed by whitespace
\d{1,3} # one group of between one and three digits
| # ALTERNATIVE
\d{5,8} # one group of between five and eight digits
)
$ # end of string
""", flags=re.X)
# --- MATCHES ---
In [177]: regex.findall('123 456 7')
Out[177]: ['123 456 7']
In [178]: regex.findall('12345')
Out[178]: ['12345']
In [179]: regex.findall('+ 123 456 78')
Out[179]: ['+ 123 456 78']
In [200]: regex.findall('12345678')
Out[200]: ['12345678']
# --- NON-MATCHES ---
In [180]: regex.findall('123456789')
Out[180]: []
In [181]: regex.findall('+ 124 578a')
Out[181]: []
In [182]: regex.findall('+123456789')
Out[182]: []
In [198]: regex.findall('123')
Out[198]: []
In [24]: regex.findall('1234 556')
Out[24]: []
You can do something like this:
^(?:[ +]*[0-9]){5}(?:(?:[ +]*[0-9])?){3}$
See it here on Regexr
The first group (?:[ +]*[0-9]){5} are the 5 minimum digits, with any amount of spaces and plus before, the second part (?:(?:[ +]*[0-9])?){3} matches the optional digits, with any amount of spaces and plus before.
You were very close - you need to anchor the lookahead to the start of input, and add a second negative lookahead for the upper bound of the quantity of digits:
^(?=(.*\d){5,8})(?!(.*\d){9,})[\d +]+$
Also, fyi you don't need to escape the plus sign within the character class, and [0-9] is \d