regex: find one-digit number

regex: find one-digit number - regex

I need to find the text of all the one-digit number.
My code:
$string = 'text 4 78 text 558 my.name#gmail.com 5 text 78998 text';
$pattern = '/ [\d]{1} /';
(result: 4 and 5)
Everything works perfectly, just wanted to ask it is correct to use spaces?
Maybe there is some other way to distinguish one-digit number.
Thanks

First of all, [\d]{1} is equivalent to \d.
As for your question, it would be better to use a zero width assertion like a lookbehind/lookahead or word boundary (\b). Otherwise you will not match consecutive single digits because the leading space of the second digit will be matched as the trailing space of the first digit (and overlapping matches won't be found).
Here is how I would write this:
(?<!\S)\d(?!\S)
This means "match a digit only if there is not a non-whitespace character before it, and there is not a non-whitespace character after it".
I used the double negative like (?!\S) instead of (?=\s) so that you will also match single digits that are at the beginning or end of the string.
I prefer this over \b\d\b for your example because it looks like you really only want to match when the digit is surrounded by spaces, and \b\d\b would match the 4 and the 5 in a string like 192.168.4.5
To allow punctuation at the end, you could use the following:
(?<!\S)\d(?![^\s.,?!])
Add any additional punctuation characters that you want to allow after the digit to the character class (inside of the square brackets, but make sure it is after the ^).

Use word boundaries. Note that the range quantifier {1} (a single \d will only match one digit) and the character class [] is redundant because it only consists of one character.
\b\d\b

Search around word boundaries:
\b\d\b
As explained by the others, this will extract single digits meaning that some special characters might not be respected like "." in an ip address. To address that, see F.J and Mike Brant's answer(s).

It really depends on where the numbers can appear and whether you care if they are adjacent to other characters (like . at the end of a sentence). At the very least, I would use word boundaries so that you can get numbers at the beginning and end of the input string:
$pattern = '/\b\d\b/';
But you might consider punctuation at the end like:
$pattern = '/\b\d(\b|\.|\?|\!)/';

If one-digit numbers can be preceded or followed by characters other than digits (e.g., "a1 cat" or "Call agent 7, pronto!") use
(?<!\d)\d(?!\d)
Demo
The regular expression reads, match a digit (\d) that is neither preceded nor followed by digit, (?<!\d) being a negative lookbehind and (?!\d) being a negative lookahead.

Related

Regex ignore special character with greedy

I used the following regex to catch 10 numbers and letters:
/[a-zA-Z0-9]{10}/g
It works fine if the 10 characters are only numbers and letters.
e.g. input: 12345xcdw034342
it catches 12345xcdw0
But in this case with special characters or space, it doesn't catch it.
123}456712234324Zz3 or 123}45 71223AB3
It should catch 10 numbers and letters regardness of characters.
Any help would be gratefully appreciated.

You can do it but not without any extra processing
As you have not spetified what language you're using Ill use Javascript for being quite universal but the same logic must apply in any language.
Here are the options I can think of
if I have testString = "12#34{56A789BDE"
Match the all until the first ten alphanumeric caracters, and then remove the spetial characters in the resulting string
testString.match(/(\w.*?){10}/)[0].replaceAll(/\W/g, '')
// results '123456A789'
// explanation: we take the first \w and use .*? to indicate that we dont care if the alphanumeric has a non-alphanumeric right next to it, then we clean the result by removing \W which means non-alphanumeric
Match only the first ten alphanumeric caracters and then join them to make a result string
testString.match(/\w/g).splice(0,10).join('')
// results '123456A789'
// explanation: we match 10 groups of aphanumeric characters represented by \w (note the lowercase) and we join the first 10 (using splice to get them) as each group "()" is in the case of javascript returned as an element of an array of matches
Remove the spetial characters from your string and then take the first ten
testString.replaceAll(/\W/g,'').match(/\w{10}/)[0]
// results '123456A789'
// explanation: we replace \W which means non alpha numeric characters, with '' to delete them then we match the first ten

You can use
/[a-zA-Z0-9](?:[^a-zA-Z0-9]*[a-zA-Z0-9]){9}/g
See the regex demo. Details:
[a-zA-Z0-9] - an alphanumeric
(?:[^a-zA-Z0-9]*[a-zA-Z0-9]){9} - nine occurrences of any zero or more chars other than an alphanumeric char and then an alphanumeric char.

Regex how can i get only exact part in a string

I should only catch numbers which are fit the rules.
Rules:
it should be 16 digit
first 11 digit can be any number
after 3 digit should have all zero
last two digit can be any number.
I did this way;
([0-9]{11}[0]{3}[0-9]{2})
number example:
1234567890100012
now I want to get the number even it has got any letter beginning or ending of the string like " abc1234567890100012abc"
my output should be just number like "1234567890100012"
When I add [a-zA-Z]* it gives all string.
Also another point is if there is any number beginning or ending of the string like "999912345678901000129999". program shouldn't take this. I mean It should return none or nothing. How can I write this with regex.

You can use look around to exclude the cases where there are more digits before/after:
(?<!\d)\d{11}000\d\d(?!\d)
On regex101

You can use a capture group, and match optional chars a-zA-Z before and after the group.
To prevent a partial match, you can use word boundaries \b or if the string should match from the start and end of the line you can use anchors ^ and $
\b[a-zA-Z]*([0-9]{11}000[0-9]{2})[a-zA-Z]*\b
Regex demo

Regex - matching while ignoring some characters

I am trying to write a regex to max a sequence of numbers that is 5 digits long or over, but I ignore any spaces, dashes, parens, or hashes when doing that analysis. Here's what I have so far.
(\d|\(|\)|\s|#|-){5,}
The problem with this is that this will match any sequence of 5 characters including those characters I want to ignore, so something like "#123 " would match. While I do want to ignore the # and space character, I still need the number itself to be 5 digits or more in order to qualify at a match.
To be clear, these would match:
1-2-3-4-5
123 45
2(134) 5
Bonus points if the matching begins and ends with a number rather than with one of those "special characters" I am excluding.
Any tips for doing this kind of matching?

If I understood requirements right you can use:
^\d(?:[()\s#-]*\d){4,}$
RegEx Demo
It always matches a digit at start. Then it is followed by 4 or more of a non-capturing group i.e. (?:[()\s#-]*\d) which means 0 or more of any listed special character followed by a digit.

So just repeat a digit, followed by any other sequence of allowed characters 5 or more times:
^(\d[()\s#-]*){5,}$
You can ensure it ends on a digit if you subtract one of the repetitions and add an explicit digit at the end:
^(\d[()\s#-]*){4,}\d$

You can suggest non-digits with \D so et would be something like:
(\d\D*){5,}
Here is a guide.

RegEx - 1 to 10 Alphanumeric Spaces Okay

New to Regular Expressions. Thanks in advance!
Need to validate field is 1-10 mixed-case alphanumeric and spaces are allowed. First character must be alphanumeric, not space.
Good Examples:
"Larry King"
"L King1"
"1larryking"
"L"
Bad Example:
" LarryKing"
This is what I have and it does work as long as the data is exactly 10 characters. The problem is that it does not allow less than 10 characters.
[0-9a-zA-Z][0-9a-zA-Z ][0-9a-zA-Z ][0-9a-zA-Z ][0-9a-zA-Z ][0-9a-zA-Z ][0-9a-zA-Z ][0-9a-zA-Z ][0-9a-zA-Z ][0-9a-zA-Z ]
I've read and tried many different things but am just not getting it.
Thank you,
Justin

I don't know what environment you are using and what engine. So I assume PCRE (typically for PHP)
this small regex does exact what you want: ^(?i)(?!\s)[a-z\d ]{1,10}$
What's going on?!
the ^ marks the start of the string (delete it, if the expression must not match the whole string)
the (?i) tells the engine to be case insensitive, so there's no need to write all letter lower and upper case in the expression later
the (?!\s) ensures the following char won't be a white space (\s) (it's a so called negative lookahead)
the [a-z\d ]{1,10} matches any letter (a-z), any digit (\d) and spaces () in a row with min 1 and max 10 occurances ({1,10})
the $ at the end marks the end of the string (delete it, if the expression must not match the whole string)
Here's also a small visualization for better understanding.
Debuggex Demo

Try this: [0-9a-zA-Z][0-9a-zA-Z ]{0,9}
The {x,y} syntax means between x and y times inclusive. {x,} means at least x times.

You want something like this.
[a-zA-Z0-9][a-zA-Z0-9 ]{0,9}
This first part ensures that it is alphanumeric. The second part gets your alphanumeric with a space. the {0,9} allows from anywhere from 0 to 9 occurrences of the second part. This will give your 1-10

Try this: ^[(^\s)a-zA-Z0-9][a-z0-9A-Z ]*
Not a space and alphanumeric for the first character, and then zero or more alphanumeric characters. It won't cap at 10 characters but it will work for any set of 1-10 characters.

The below is probably most semantically correct:
(?=^[0-9a-zA-Z])(?=.*[0-9a-zA-Z]$)^[0-9a-zA-Z ]{1,10}$
It asserts that the first and last characters are alphanumeric and that the entire string is 1 to 10 characters in length (including spaces).

I assume that the space is not allowed at the end too.
^[a-zA-Z0-9](?:[a-zA-Z0-9 ]{0,8}[a-zA-Z0-9])?$
or with posix character classes:
^[[:alnum:]](?:[[:alnum:] ]{0,8}[[:alnum:]])?$

i think the simplest way is to go with \w[\s\w]{0,9}
Note that \w is for [A-Za-z0-9_] so replace it by [A-Za-z0-9] if you don't want _
Note that \s is for any white char so replace it by if you don't want the others

Decoding a regex... I know what it's function is but I want to understand exactly what is happening

I have a regular expression that I'm going to be using to verify that an inputted number is in standard U.S. telephone format (i.e (###) ###-####). I am new to regex and still having some trouble figuring out the exact function of each character. If someone would go through this piece by piece/verify that I am understanding I would really appreciate it. Also if the regex is wrong I would obviously like to know that.
\D*?(\d\D*?){10}
What I think is happening:
\D*?( indicates an escape sequence for the parenthesis metacharacter... not sure why the \D*? is necessary
\d indicating digits
\D*? indicating there is a non-digit character (-) followed by the closing parenthesis.
{10} for the 10 digits
I feel very unsure explaining this, like my understanding is very vague in terms of why the regex is in the order that it is etc. Thanks in advance for help/explanations.
EDIT
It seems like this is not the best regex for what I want. Another possibility was [(][0-9]{3}[)] [0-9]{3}-[0-9]{4}, but I was told this would fail. I suppose I'll have to do a little more work with regular expressions to figure this out.

\D matches any non-digit character.
* means that the previous character is repeated 0 or more times.
*? means that the previous character is repeated 0 or more times, but until the match of the following character in the regex. It is a bit difficult perhaps at the start, but in your regex, the next character is \d, meaning \D*? will match the least amount of characters until the next \d character.
( ... ) is a capture group, and is also used to group things. For instance {10} means that the previous character or group is repeated 10 times exactly.
Now, \D*?(\d\D*?){10} will match exactly 10 numbers, starting with non-digit characters or not, with non-digit characters in between the digits if they are present.
[(][0-9]{3}[)] [0-9]{3}-[0-9]{4}
This regex is a bit better since it doesn't just accept anything (like the first regex does) and will match the format (###) ###-#### (notice the space is a character in regex!).
The new things introduced here are the square brackets. These represent character classes. [0-9] means any character between 0 to 9 inclusive, which means it will match 0, 1, 2, 3, 4, 5, 6, 7, 8 or 9. Adding {3} after it makes it match 3 similar character class, and since this character class contains only digits, it will match exactly 3 digits.
A character class can be used to escape certain characters, such as ( or ) (note I mentioned earlier they are for capturing groups, or grouping) and thus, [(] and [)] are literal ( and ) instead of being used for capturing/grouping.
You can also use backslashes (\) to escape characters. Thus:
\([0-9]{3}\) [0-9]{3}-[0-9]{4}
Will also work. I would also recommend the use of line anchors ^ and $ if you're only trying to see if a phone number matches the above format. This ensures that the string has only the phone number, and nothing else. ^ matches the beginning of a line and $ matches the end of a line. Thus, the regex will become:
^\([0-9]{3}\) [0-9]{3}-[0-9]{4}$
However, I don't know all the combinations of the different formats of phone numbers in the US, so this regex might need some tweaking if you have different phone number formats.

\D is "not a digit"; \d is "digit". With that in mind:
This matches zero or more non-digits, then it matches a digit and any number of non-digit characters 10 times. This won't actually verify that the number if formatted properly, just that it contains 10 digits. I suspect that the regex isn't what you want in the first place.
For example, the following will match your regex:
this is some bad text 1 and some more 2 and more 34567890

\D matches a character that is not a digit
* repeats the previous item 0 or more times
? find the first occurrence
\d matches a digit
so your group is matches 10 digits or non digits

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

regex: find one-digit number - regex

Use word boundaries. Note that the range quantifier {1} (a single \d will only match one digit) and the character class [] is redundant because it only consists of one character. \b\d\b

Search around word boundaries: \b\d\b As explained by the others, this will extract single digits meaning that some special characters might not be respected like "." in an ip address. To address that, see F.J and Mike Brant's answer(s).

Related

Regex ignore special character with greedy

Regex how can i get only exact part in a string

Regex - matching while ignoring some characters

RegEx - 1 to 10 Alphanumeric Spaces Okay

Decoding a regex... I know what it's function is but I want to understand exactly what is happening

Categories

Resources