I'm trying a regex fro Alpha Numeric of length 7 (with positions 1,3,4 as characters and positions 2,5,6,7 as digits).
[a-zA-Z]|[0-9]|[a-zA-Z]|[a-zA-Z]|[0-9]|[0-9]|[0-9]
Can someone help me?
The sequence "character, digit, character, character, digit, digit, digit" is expressed in regex as
[a-zA-Z][0-9][a-zA-Z]{2}[0-9]{3}
If you're working in PCRE (with say, PHP):
^([a-zA-Z])([0-9])(?1){2}(?2){3}$
Breakdown:
^ - from the start of the string
([a-zA-Z]) - match and capture a single character in the ranges given: a-z, A-Z
([0-9]) - match and capture a single character in the ranges given: 0-9
(?1){2} - redo the regex in the first group twice (recursive subpattern)
(?2){3} - redo the regex in the second group 3 times (recursive subpattern)
$ - the end of the string
If you want to match this in the middle of a sentence, exchange ^ and $ for \b - which will match a word boundary
See the demo
If you're not using PCRE:
^[a-zA-Z][0-9][a-zA-Z]{2}[0-9]{3}$
Which does the same thing, but has some copy-paste involved
Related
I've been trying to solve this problems for few hours but with no luck. The task is to write a regular expression that matches at least four words starting with the same letter. But! These words do not have to be one after another.
This regex should be able to match a line like this:
cat color coral chat
but also one like this:
cat take boom candle creepy drum cheek
Thank you!
So far I have got this regex but it only matches words when they are in order.
(\w)\w+\s+\1\w+\s+\1\w+\s+\1
If you have only words in the line that can be matched with \w:
\b(\w)\w*(?:(?:\s+\w+)*?\s+\1\w*){3}
Explanation
\b A word boundary to prevent a partial word match
(\w)\w* Capture a single word character in group 1 followed by matching optional word characters
(?: Non capture group to repeat as a whole part
(?:\s+\w+)*? Match 1+ whitespace chars and 1+ word chars in between in case the word does not start with the character captured in the back reference
\s+\1\w* Match 1+ whitespace chars, a backreference to the same captured character and optional word characters
){3} Close the non capture group and repeat 3 times
See a regex demo
Note that \s can also match a newline.
If the words that should with the same character should be at least 2 characters long (as (\w)\w+ matches 2 or more characters)
\b(\w)\w+(?:(?:\s+\w+)*?\s+\1\w+){3}
See another regex demo.
Another idea to match lines with at least 4 words starting with the same letter:
\b(\w)(?:.*?\b\1){3}
See this demo at regex101
This is not very accurate, it just checks if there are three \b word boundaries, each followed by \1 in the first group \b(\w) captured character to the right with .*? any characters in between.
I'm new to regular expressions and trying to figure out which expression would match 1,2,3 and 6,7,8,9th letter in the string, starting from the end of the string. It would also need to include \D (for non-digits), so if 3rd letter from the end is a number it will exclude it.
Example of a string is
Wsd-kaf_23psd_trees32rap
So the result should be:
reesrap
or for
Wsd-kaf_23psd_trees324ap
it would be
reesap
This
(?<=^.{9}).*
gives me last 9 chars, but that's not really what I want.
Does anyone knows how can I do that?
Thanks.
You could try to use alternations to find all characters upto the position that holds 9 character untill the end or consecutive digits:
(?:^.*(?=.{9})|\d+)
See an online demo. Replace with empty string.
(?: - Open non-capture group;
^.* - Any 0+ characters (greedy), upto;
(?=.{9}) - A positive lookahead to assert position is followed by 9 characters;
| - Or;
\d+ - 1+ digits.
If, however, your intention was to match the characters seperately, then try:
\D(?=.{0,8}$)
See an online demo. Any non-digit that has 0-8 characters upto the end-line character.
Please help me compose a working regular expression.
Conditions:
There can be a maximum of 9 characters (from 1 to 9).
The first eight characters can only be uppercase letters.
The last character can only be a digit.
Examples:
Do not match:
S3
FT5
FGTU7
ERTYUOP9
ERTGHYUKM
Correspond to:
E
ERT
RTYUKL
VBNDEFRW3
I tried using the following:
^[A-Z]{1,8}\d{0,1}$
but in this case, the FT5 example matches, although it shouldn't.
You may use an alternation based regex:
^(?:[A-Z]{1,8}|[A-Z]{8}\d)$
RegEx Demo
RegEx Details:
^: Start
(?:: Start non-capture group
[A-Z]{1,8}: Match 1 to 8 uppercase letters
|: OR
[A-Z]{8}\d: Match 8 uppercase letters followed by a digit
): End non-capture group
$: End
You might also rule out the first 7 uppercase chars followed by a digit using a negative lookhead:
^(?![A-Z]{1,7}\d)[A-Z]{1,8}\d?$
^ Start of string
(?![A-Z]{1,7}\d) Negative lookahead to assert not 1-7 uppercase chars and a digit
[A-Z]{1,8} Match 1-8 times an uppercase char
\d? Match an optional digit
$ End of string
Regex demo
With a regex engine that supports possessive quantifiers, you can write:
^[A-Z]{1,7}+(?:[A-Z]\d?)?$
demo
The letter in the optional group can only succeed when the quantifier in [A-Z]{1,7}+ reaches the maximum and when a letter remains. The letter in the group can only be the 8th character.
For the .net regex engine (that doesn't support possessive quantifiers) you can write this pattern using an atomic group:
^(?>[A-Z]{1,7})(?:[A-Z]\d?)?$
I need to match only section numbers in text with a dot at the end. For example, having a string:
'A.8. 8.4.2.4.1.2. 9.1. 9. 10.0.1.1. 9 0.1 100. 100.5. A.500'
What I want to match: [A.8., 8.4.2.4.1.2., 9.1., 9., 10.0.1.1., 100., 100.5.]
What I have matched: [A.8., 8.4.2.4.1.2., 9.1., 10.0.1.1., 100.5.]
My regex is (?:\d+|A)\.[\d+\.]*\.
Those numbers, without the dot at the end are not matched, which is correct. However, singular numbers with a dot should be matched but are not (such as '9.' and '100.')
How can I update my regex to make it work?
You may use
\b(?:\d+|A)(?:\.\d+)*\.\B
See the regex demo.
If the matches are whitespace-separated you can also use (granted the regex engine supports lookbehinds):
(?<!\S)(?:\d+|A)(?:\.\d+)*\.(?!\S)
See the regex demo.
Details:
\b - a word boundary (start of string or a non-word char should immediately precede the next digit or A)
(?:\d+|A) - one or more digits, or an A
(?:\.\d+)* - zero or more repetitions of a dot and then one or more digits
\. - a dot
\B - a position other than a word boundary (end of string or a non-word char should follow immediately)
The (?<!\S) / (?!\S) require a whitespace or start/end of string positions on both ends of the match.
I need to find words in a string that start with number(i.e digit)
In following string:
1st 2nd 3rd a56b 5th 6th ***7th
The words 1st 2nd 3rd 5th 6th should be returned.
I tried with the regex:
(\b[^ a-zA-Z ^ *]+(th|rd|st|nd))+
But this regex returns the words not starting with alphabets but can't handle the cases when word starts with special characters.
For the current string, you may use a pattern like
(?<!\S)\d+(?:th|rd|st|nd)\b
See the regex demo
The pattern matches:
(?<!\S) - a location at the start of a string or after a whitespace
\d+ - 1 or more digits
(?:th|rd|st|nd) - one of the four alternatives
\b - a word boundary.
If you plan to match any 0+ non-whitespace chars after a digit that is preceded with a whitespace or is at the start of a string, use
(?<!\S)\d\S*
where \S* will match any 0+ non-whitespace chars.
See this regex demo.
NOTE: In case the lookbehind is not supported, replace (?<!\S) with (?:^|\s) and also wrap the rest of the pattern with a capturing group to access the latter later:
(?:^|\s)(\d\S*)
and the value will be in Group 1.
To get word which is starting with number/digit and ending with th/st/nd/rd you can try this.
((?<!\S)(\d+)(th|rd|nd|st))
(?<!\S) detects the word's starting position
\d+ matches 1 or more digits
th|rd|st|nd matches one among those 4.
You can check it here