Regex, capture 4 numbers surrounded by . or line start/end - c++

I'm trying to capture any 1-2 digit numbers surrounded by '.' or the beginning/end of a line.
E.G
1.0.4.11
71.11.11.11
0.11.0.0
Are valid and:
1.
1111
11.11.11.
01.10
are not valid
Right now I've got (?<=\.|^)\d{1,2}]?(?=\.|$) which will capture the numbers correctly but will also capture groups such as 11.. or 1.11.
I need to extend this regex to basically verify that it is always in the format x.x.x.x where x is 1-2 digits.
For additional information, this regex will run using the wxWidgets regex class but I believe that's the standard regex parser.
NOTE
For anyone using this as reference... Using wxWidgets, the wxRegex class must be constructed with the wxRE_ADVANCEDflag as by default it uses a basic/fast implementation that does not include quantifiers(?*) which are used in this expression.

You can make it less generic by specifically look for your 4 groups between start and end of string (you can remove the \.? if you never have . at the start or end):
^\.?\d{1,2}\.\d{1,2}\.\d{1,2}\.\d{1,2}\.?$
See in Regex101

Related

Regex - match number within a text that does not start with a certain string

I've searched through multiple answers on SO now, but most of them consider the beginning of the line as the whole string being looked upon, which doesn't serve my case, I think (at least all the answers I tried didn't work).
So, I want to match all codes within a text that are 7-digit long, start with 1 or 2, and are not prefixed by "TC-" and its lowercase variants.
Came up with the /(!?TC-){0}(1|2)\d{6}/g expression, but it doesn't work for not matching the codes that start with "TC-", and I don't know how can I prevent from selecting those. Is there a way to do that?
I've created an example pattern on Regexr: regexr.com/6p70c.
You can assert not TC- to the left using negative lookbehind (?<! and omit the {0} quantifier as that makes it optional:
(?<!\bTC-)\b[12]\d{6}\b
Regex demo

Remove in-text citation numbers but not decimal numbers without referencing groups? (regex)

I've wrote small python program to make regex changes and to convert my pdf textbook into audio files to listen to while I drive. It occurred to me that I could use the pdf reading program Librera Reader which has built in TTS and regex replacement to do this task more flexibly and while being able to read along easily. However, Librera Reader can't use a group reference in the replacement text.
This is the substitution I had been using:
([a-zA-Z|\)|%][\.|\,|a-z|\)])\d+(?:[-,]\d+)*
Here is a simplified version that does most of the work for the purpose of this question:
([a-zA-Z][\.])\d+
Replaced with:
\1
Is there a way to use Regex to capture a letter followed by a period followed by a number like this without using a group reference in the replacement and without capturing a number period number string. so that I could make the following conversion:
test words.7 Also 1.5 is a number that can test.9
test words. Also 1.5 is a number that can test.
I understand you used | inside [...] to "better" visually separate parts of the character class, but you also made | part of the class that now matches a literal pipe. You need to remove these pipes.
To solve the current problem, you may turn the capturing group into a positive lookbehind because the pattern is of known length (only two chars before the number (range) you want to remove).
You may use
(?<=[a-zA-Z)%][.,a-z)])\d+(?:[-,]\d+)*
See the regex demo
The (?<=[a-zA-Z)%][.,a-z)]) positive lookbehind matches a location that is immediately preceded with
[a-zA-Z)%] - an ASCII letter, ) or % and then
[.,a-z)] - ., ,, a lowercase ASCII letter or ).

How to find a sequence of formatted digits in Apache Nifi using a regular expression?

I want to find using Apache Nifi this kind of text in a CSV with lots of text:
nnnn?nn
where n is a digit between 0 and 9, and ? is a literal question mark.
A real example is:
8764?23
It always has 4 digits before ? and 2 digits after.
How can this be done?
Starting off simple:
\d{4}\?\d{2}
But this would also match 8764?23 within a longer string such as 98764?23 or 8764?234.
If you need to find exact matches as individual values within the CSV, a more complex regular expression is needed:
(?:^|,)\s*(\d{4}\?\d{2})\s*(?:,|$)
This may look a bit strange at first sight so let's break it down:
(?:^|,) uses the (something|something else) syntax to allow a choice of two different things - here it is allowing either the very start of the string ^ or a comma ,. The ?: at the start excludes this expression from being included as a capturing group.
\s* allows any amount of whitespace (i.e. zero or more spaces, tabs etc.) to appear before the matched expression.
(\d{4}\?\d{2}) specifies exactly 4 digits \d{4} followed by a question mark \? (which needs to be escaped to distinguish it from the regex ? meaning 0 or 1 occurrences), followed by 2 more digits \d{2}. The surrounding brackets () are used to specify this as a capturing group.
\s* allows more whitespace after the matched expression.
(?:,|$) allows either a comma , or the end of the string $ and ?: excludes this from being a capturing group.
Demo
https://regex101.com/r/X0Ic4v/1
Usage
The above can be used with Nifi's ExtractText to get the first capturing group for each match. Since it is only the capturing group that is of interest and not the rest of the match, "Include Capture Group 0" can be set to false. Presumably both "Enable Multiline Mode" and "Enable repeating capture group" should be set to true.
Further considerations
The above assumes that 8764?23 appears exactly like that as a value in a CSV string. But maybe you need to allow "8764?23"? Or possibly others such as '8764?23', _8764?23_ or even ABC8764?23DEF? There are too many possible variants for here a one size fits all so please reply in the comments to state the requirements if the above doesn't fit your needs.
Here is your regular expression: \d\d\d\d\?\d\d and tool where you can use it (and here more complicated version)
This is the Regex required for your needs.
(\d{4}\?\d\d)

positive look ahead and replace

Recently I'm writing/testing regexps on https://regex101.com/.
My question is: Is it possible to do a positive look-ahead AND a replacement in the same "replacement"? Or just limited kind of replacement is possible.
Input is several lines with phone numbers. Let's say the correct phone number where the number of "numbers" are 11. No matter how the numbers are divided/group together with - / characters, no matter if starts with + 00 or it is omitted.
Some example lines:
+48301234567
+48/30/1234567
+48-30-12-345-67
+483011223344556677
0048301234567
+(48)30/1234567
Positive look-ahead able to check if from the beginning until the end of line there are only 11 digits, regardless how many other, above specified character separating them. This works perfectly.
Where the positive look-ahead check is fine, I would like to delete every character but numbers. The replacement works fine until I'm not involving look-ahead.
Checking the regexp itself working perfectly ("gm" modes):
^(?:\+|00)?(?:[\-\/\(\)]?\d){11}$
Checking the replace part works perfectly (replace to nothing):
[^\d\n]
Put this into look-ahead, after the deletion of non new-line and non-digit characters from the matching lines:
(?=^(?:\+|00)?(?:[\-\/\(\)]?\d){11}$)[^\d\n]
Even I put the ^ $ into look-ahead, seems the replacement working only from beginning of the lines until the very first digit.
I know in real life the replacement and the check should/would go separate ways, however I'm curious if I could mix look-ahead/look-behind with string operations like replace, delete, take the string apart and put together as I like.
UPDATE: This is what would do the trick, however I feel this one "ugly" a bit. Is there any prettier solution?
https://regex101.com/r/yT5dA4/2
Or the version which I asked originally, where only digits remains: regex101.com/r/yT5dA4/3
You cannot replace/delete text with regex. Regex is just a tool for matching certain strings and then taking certain action depending on the matching text, eg. perform a substitution, retrieve the second capture group.
However it is possible to perform certain decisions within a regex engine, by using conditionals. The common syntax for this, with a lookahead assertion, is (?(?=regex)then|else).
With conditionals you can change the behaviour depending on how the text matches the regex. For your example you could do something like:
^(\+)?(?(1)\(|\d)
If the phone number starts with a plus it must be followed by a bracket, else it should start with a digit. Although in your situation, this is not very useful.
If you want to read up more on conditionals in regex you can do so here.

How to optimise this regex to match string (1234-12345-1)

I've got this RegEx example: http://regexr.com?34hihsvn
I'm wondering if there's a more elegant way of writing it, or perhaps a more optimised way?
Here are the rules:
Digits and dashes only.
Must not contain more than 10 digits.
Must have two hyphens.
Must have at least one digit between each hyphen.
Last number must only be one digit.
I'm new to this so would appreciate any hints or tips.
In case the link expires, the text to search is
----------
22-22-1
22-22-22
333-333-1
333-4444-1
4444-4444-1
4444-55555-1
55555-4444-1
666666-7777777-1
88888888-88888888-1
1-1-1
88888888-88888888-22
22-333-
333-22
----------
My regex is: \b((\d{1,4}-\d{1,5})|(\d{1,5}-\d{1,4}))-\d{1}\b
I'm using this site for testing: http://gskinner.com/RegExr/
Thanks for any help,
Nick
Here is a regex I came up with:
(?=\b[\d-]{3,10}-\d\b)\b\d+-\d+-\d\b
This uses a look-ahead to validate the information before attempting the match. So it looks for between 3-10 characters in the class of [\d-] followed by a dash and a digit. And then after that you have the actual match to confirm that the format of your string is actually digit(dash)digit(dash)digit.
From your sample strings this regex matches:
22-22-1
333-333-1
333-4444-1
4444-4444-1
4444-55555-1
55555-4444-1
1-1-1
It also matches the following strings:
22-7777777-1
1-88888888-1
Your regexp only allows a first and second group of digits with a maximum length of 5. Therefore, valid strings like 1-12345678-1 or 123456-1-1 won't be matched.
This regexp works for the given requirements:
\b(?:\d\-\d{1,8}|\d{2}\-\d{1,7}|\d{3}\-\d{1,6}|\d{4}\-\d{1,5}|\d{5}\-\d{1,4}|\d{6}\-\d{1,3}|\d{7}\-\d{1,2}|\d{8}\-\d)\-\d\b
(RegExr)
You can use this with the m modifier (switch the multiline mode on):
^\d(?!.{12})\d*-\d+-\d$
or this one without the m modifier:
\b\d(?!.{12})\d*-\d+-\d\b
By design these two patterns match at least three digits separated by hyphens (so no need to put a {5,n} quantifier somewhere, it's useless).
Patterns are also build to fail faster:
I have chosen to start them with a digit \d, this way each beginning of a line or word-boundary not followed by a digit is immediately discarded. Other thing, using only one digit, I know the remaining string length.
Then I test the upper limit of the string length with a negative lookahead that test if there is one more character than the maximum length (if there are 12 characters at this position, there are 13 characters at least in the string). No need to use more descriptive that the dot meta-character here, the goal is to quickly test the length.
finally, I describe the end of string without doing something particular. That is probably the slower part of the pattern, but it doesn't matter since the overwhelming majority of unnecessary positions have already been discarded.