I need to match a single digit, 1 through 9. For example, 3 should match but 34 should not.
I have tried:
\d
\d{1}
[1-9]
[1-9]{1}
[1-9]?
They all match 3 and 34. I am using regex for this because it is part of a much larger expression in which I am using alternation.
The problem with all of your examples, of course, is that they match the digit, but don't keep themselves from matching multiple digits next to each other.
In the following example:
Some text with a 3 and a 34 and what about b5 and 64b?
This regex will match only the lone 3. It uses word boundaries, a handy feature.
\b[1-9]\b
It gets more complicated if you want to match single digits inside words, like the 5 in my example, but you didn't specify if you'd want that, so I'll leave that out for now.
Related
I have multiple 24-hour time strings through several files. For example, 1234, which I wish to replace with 12:34.
Finding them is easy, just \d\d\d\d, that I understand and it works. However, what replace string do I need. In other words, say xx:xx, what do I put in place of each x.
I've tried numbers of things to no avail. I'm obviously not understanding how I get it to remember the digits it found and to recall them in the replace string.
If in your example data 4 digits represent 24 hour time strings you could match 2 capturing groups between word boundaries to prevent a match with more then 4 digits. You can Adjust the word boundaries to your requirements.
Match
\b(\d{2})(\d{2})\b
Replace
group1:group2 \1:\2
Explanation
\b Match a word boundary
(\d{2}) Capture in a group 2 digits
(\d{2}) Capture in a group 2 digits
\b Match a word boundary
Note
Matching 4 digits does not verify a valid 24 hour time. You could match that using for example \b([01][0-9]|2[0-3])([0-5][0-9])\b and replace with \1:\2
I have a question about groups in a rule i created to extract dates from text.
Let's consider the following string:
fherfrefercr17hfeuetvbyeituew
The string is composed by everything at the beginning, then there is a number composed by one or two digits and then everything again. I need to extract only the number "17" from the string listed above.
With the following rule i extract only 7 and not 17.
.*(\d{1,2}).*
Can anyone help me with that please?
Overview
Given your pattern:
.*(\d{1,2}).*
This works in the following way:
.* Match any character any number of times
The quantifier here is considered to be greedy because it will match as many characters as possible so long as the pattern matches the string.
\d{1,2} Since your pattern says to match 1 or 2 digits and the previous token is greedy, the regex is just going to match a single digit because this still satisfies the pattern (the previous token stole the first digit).
Code
There are multiple ways you can fix this issue
Method 1
This will simply extract all numbers (1+ digits) from the string. If you want to only match 1 or two digits use \d\d? or \d{1,2} instead.
\d+
\d\d?
\d{1,2}
Method 2
This method turns the greedy quantifier * (in .*) into a lazy quantifier .*?. This will match any character any number of times, but as few as possible. The drawback to this method is that it's expensive because the engine needs to backtrack.
.*?\d{1,2}.*
Method 3
This method matches any non-digit character any number of times, then it matches one or two digits. This is likely the solution you're looking for.
\D*(\d{1,2}).*
I have files with these filename:
ZATR0008_2018.pdf
ZATR0018_2018.pdf
ZATR0218_2018.pdf
Where the 4 digits after ZATR is the issue number of magazine.
With this regex:
([1-9][0-9]*)(?=_\d)
I can extract 8, 18 or 218 but I would like to keep minimum 2 digits and max 3 digits so the result should be 08, 18 and 218.
How is possible to do that?
You may use
0*(\d{2,3})_\d
and grab Group 1 value. See the regex demo.
Details
0* - zero or more 0 chars
(\d{2,3}) - Group 1: two or three digits
_\d - a _ followed with a digit.
Here is a PCRE variation that grabs the value you need into a whole match:
0*\K\d{2,3}(?=_\d)
See another regex demo
Here, \K makes the regex engine omit the text matched so far (zeros) and then matches 2 to 3 digits that are followed with _ and a digit.
(?:[1-9][0-9]?)?[0-9]{2}(?=_[0-9])
or perhaps:
(?:[1-9][0-9]+|[0-9]{2})(?=_[0-9])
(https://www.freeformatter.com/regex-tester.html, which claims to use the XRegExp library, that you mention in another answer doesn't seem to backtrack into the (?:)? in my first suggestion where necessary, which makes it very different from any regex engine I've encoutered before and makes it prefer to match just the 18 of 218 even though it starts later in the string. But it does work with my second suggestion.
([1-9]\d{2,3})(?=_\d)
{x,y} will match from x to y times the previous pattern, in this case \d
Edit: from your own regex it looked as you wanted the part of the number which starts with a non-zero. However since your examples include leading 0s, maybe you really wanted :
(\d{2,3})(?=_\d)
Which will give you the last 3 digits before underscore unless there are only 2 digits.
I propose you:
^ZATR0*(\d{2,3})_\d+\.pdf$
demo code here. Result:
Match 1 Full match 0-17 ZATR0008_2018.pdf Group 1. 6-8 08
Match 2 Full match 18-35 ZATR0018_2018.pdf Group 1. 24-26 18
Match 3 Full match 36-53 ZATR0218_2018.pdf Group 1. 41-44 218
I need to be able to check if a string contains either a 2 digit or a 4 digit number before a . (period).
For example, 39. is good, and so is 3926., but 392. is not.
I originally had (^\\d{2,4).$) but that allows between a 2 and a 4 digit number preceding a period.
I also tried (^\\d{2}.|\\d{4}.$) but that didn't work.
You can use this regex:
^\d{2}(?:\d{2})?\.$
This regex makes 2nd set of \d{2} optional thus allowing to match 12. or 1234. but not 123..
In the expression (^\d{2}.|\d{4}.$), the dots match any character.
Try escaping them to make them match literal dots: (^\d{2}\.|\d{4}\.$)
My RegExp is very rusty! I have two questions, related to the following RegExp
Question Part 1
I'm trying to get the following RegExp to work
^.*\d{1}\.{1}\d{1}[A-Z]{5}.*$
What I'm trying to pass is x1.1SMITHx or x1.1.JONESx
Where x can be anything of any length but the SMITH or JONES part of the input string is checked for 5 upper case characters only
So:
some preamble 1.1SMITH some more characters 123
xyz1.1JONES some more characters 123
both pass
But
another bit of string1.1SMITHABC some more characters 123
xyz1.1ME some more characters 123
Should not pass because SMITH now contains 3 additional characters, ABC, and ME is only 2 characters.
I only pass if after 1.1 there are 5 characters only
Question Part 2
How do I match on specific number of digits ?
Not bothered what they are, it's the number of them that I can't get working
if I use ^\d{1}$ I'd have thought it'll only pass if one digit is present
It will pass 5 but it also passes 67
It should fail 67 as it's two digits in length.
The RegExp should pass only if 1 digit is present.
For the first one, check out this regex:
^.*\d\.\d[A-Z]{5}[^A-Z]*$
Before solving the problem, I made it easier to read by removing all of the {1}. This is an unnecessary qualifier since regex will default to looking for one character (/abc/ matches abc not aaabbbccc).
To fix the issue, we just need to replace your final .*. This says match 0+ characters of anything. If we make this "dot-match-all" more specific (i.e. [^A-Z]), you won't match SMITHABC.
I came up with a number of solution but I like these most. If your RegEx engine supports negative look-ahead and negative look-behind, you can use this:
Part 1: (?<![A-Z])[A-Z]{5}(?![A-Z])
Part 2: (?<!\d)\d(?!\d)
Both have a pattern of (?<!expr)expr(?!expr).
(?<!...) is a negative look-behind, meaning the match isn't preceded by the expression in the bracket.
(?!...) is a negative look-ahead, meaning the match isn't followed by the expression in the bracket.
So: for the first pattern, it means "find 5 uppercase characters that are neither preceded nor followed by another uppercase character". In other words, match exactly 5 uppercase characters.
The second pattern works the same way: find a digit that is not preceded or followed by another digit.
You can try it on Regex 101.