I need a regular expression that matches only numbers of length 7 (they can have leading zeros). I used the following super easy regex: \b[0-9]{7}\b. However, this regex also matches numbers in e.g. 5254-6408499 and (0241)4013999 (see https://regex101.com/r/zF5hV7/1).
How can I prevent them from being matched? I only want numbers of length 7 having leading and/or trailing spaces.
Depending on the regular expression flavor, you could create your own boundaries:
(?<=^| )\d{7}(?= |$)
This asserts that either the beginning of the string or a space precedes moving on to matching exactly 7 digits only if the engine asserts that either a space or the end of string follows.
You can use this regex:
(?:^|\s)([0-9]{7})(?:\s|$)
and grab captured group #1
Updated RegEx Demo
Related
I need to process numbers that may have optional thousand-separators, such as 1234567 and 1,234,567
I naively assumed I could achieve this with
(\d{1,3}([,]?(\d{3}))*)
This, however, matches only 123456 (not the 7) and 1,234,567 (correctly)
However, if I specify an explicit number of matches (2 in this case)
(\d{1,3}([,]?(\d{3})){2})
or a bound (such as \b)
(\d{1,3}([,]?(\d{3}))*)\b
the full match is performed.
Why does the “greedy” * quantifier stop after the first match in the first regex?
If you want to match both numbers with, and without, proper comma thousands separators, then I would use an alternation:
^(\d{1,3}(?:,\d{3})*|\d+)$
Demo
The reason is that \d{1,3} is greedy, so it matches 123 at the beginning of the number. Then the rest of the regexp will only match groups of exactly 3 digits because it uses \d{3}. A regular expression doesn't try to match the longest possible string, so it won't backtrack and shorten the match for \d{1,3} to make the rest of the regexp go further.
But if you add a word boundary \b at the end, it no longer matches with that 3-digit prefix. That causes it to backtrack until it's able to match groups of 3 digits ending with a word boundary.
I would like to check all the strings with the format hostname abc_pqr_xyz in a file. Need a regex for this. There should be exactly 2 _'s and 3 words in the string.
I have tried using the regex ^hostname \s+.*_.*_.*
But it is giving a positive result for abc_abc_abc_abc_abc, as it considering abc_abc_abc as one word.
You may use a [^_] negated character class that matches any char but _ instead of .:
^hostname\s+[^_]*_[^_]*_[^_]*$
See the regex demo and a Regulex graph:
See $ at the end that checks the end of the string.
Also, a space before \s+ will require a space and then 1 or more whitespace chars, thus, that space may be harmful, that's why I removed it from the expression.
Note you may group the _[^_]* and then set the number of repetitions that you may adjust in the future:
^hostname\s+[^_]*(?:_[^_]*){2}$
See this regex demo.
I want to match only individual numbers from the following sample input:
[2,4,7,9-11]
Regular expression should match 2,4 & 7, but not 9-11.
Your targets have non hyphens fore and aft:
(?<!-)\b\d+\b(?!-)
See live demo.
For single character matching this might suffice. \b is a word boundary and \d indicates that we're looking for a single digit.
\b\d\b
If you would like to omit single Zeros then you would do something like this with a custom range:
\b[1-9]\b
If you're okay with double-digit numbers and zero, then you would add a plus + (means more than one) to the original:
\b\d+\b
To match any single number from the provided that would not part of a range you would use boundaries and look-arounds:
\b(?<!-)\d(?!-)\b
You can learn more about Regex here.
I have a spec that says a particular field will be alpha-text, right-padded with spaces to be 10 characters long, and I want to capture the alpha-part of the match.
This expression captures the entire section:
"([[:alpha:][:s:]]{10})"
However, I only want to capture the alpha-part, and still match (but not capture) on the remaining white-space. So if the alpha is 3-characters long, the next match needs to 7 white-spaces.
How can I do this?
I would say your best bet is to use 2 regular expressions. Regex doesn't really have support for what you're trying to do.
The first regular expression would get all strings length 10 right padded by spaces
([a-zA-Z\s]{10})
After that, just capture the word part. We know each string is only 10 characters at this point.
(\w+)\s*
This regex pattern will match a string, starting with (optional) [A-Za-z] characters, ending with upto 10 spaces, for a total string length of 10.
"^([A-Za-z]+)?\\ {0,10}"
Then, I added a positive lookahead to ensure the pattern only matches when the string length is 10.
"^(?=.{10}$)([A-Za-z]+)?\\ {0,10}$"
Edit: Try this using the [:alpha:] and [:space:]
"^(?=.{10}$)([:alpha:]+)?[:space:]{0,10}$"
I hav a list of strings, such as: Ø20X400
I need to extract the first of the numbers - between Ø and X
I've come so far to match the numbers in general with \d+ - as simple as it is...
But I need an expression to get the first value separated, not both of them...
You can use lookarounds (?<=..) and (?=..):
(?<=Ø)\d+(?=X)
or in Java style:
(?<=Ø)\\d+(?=X)
A second way is to use a capture group:
Ø(\d+)X
or
Ø(\\d+)X
Then you can extract the content of the group.
The regex engines I know parse \n as a newline. \d is used for numbers.
The following regex gives you the first number between a Ø and a X in a capture group:
^.*?Ø(\d+)X.*
Edit live on Debuggex
This Regex will do it for you, (\d+?)X, and here is a Rubular to prove it. See, you want to group digits together, but make it non-greedy, ending the evaluation on X.
Try this one:
\d+(?=\D)
Should find first number wich has some not a number ahead
With normal regular expressions, I would say:
Ø(\d+)X
This finds the Ø character, followed by one or more numbers, followed by an X. Also, the numbers will be stored in the first capture group. Capture groups differ from one regex implementation to another, but this would typically be denoted by \1. Capture group zero, \0, is usually the matched string itself. In this version, \d denotes digits 0-9, but if your regex engine uses \n for that purpose, use:
Ø(\n+)X