Finding words in a string that start with number (Regex) - regex

I need to find words in a string that start with number(i.e digit)
In following string:
1st 2nd 3rd a56b 5th 6th ***7th
The words 1st 2nd 3rd 5th 6th should be returned.
I tried with the regex:
(\b[^ a-zA-Z ^ *]+(th|rd|st|nd))+
But this regex returns the words not starting with alphabets but can't handle the cases when word starts with special characters.

For the current string, you may use a pattern like
(?<!\S)\d+(?:th|rd|st|nd)\b
See the regex demo
The pattern matches:
(?<!\S) - a location at the start of a string or after a whitespace
\d+ - 1 or more digits
(?:th|rd|st|nd) - one of the four alternatives
\b - a word boundary.
If you plan to match any 0+ non-whitespace chars after a digit that is preceded with a whitespace or is at the start of a string, use
(?<!\S)\d\S*
where \S* will match any 0+ non-whitespace chars.
See this regex demo.
NOTE: In case the lookbehind is not supported, replace (?<!\S) with (?:^|\s) and also wrap the rest of the pattern with a capturing group to access the latter later:
(?:^|\s)(\d\S*)
and the value will be in Group 1.

To get word which is starting with number/digit and ending with th/st/nd/rd you can try this.
((?<!\S)(\d+)(th|rd|nd|st))
(?<!\S) detects the word's starting position
\d+ matches 1 or more digits
th|rd|st|nd matches one among those 4.
You can check it here

Related

Regular expression that matches at least 4 words starting with the same letter?

I've been trying to solve this problems for few hours but with no luck. The task is to write a regular expression that matches at least four words starting with the same letter. But! These words do not have to be one after another.
This regex should be able to match a line like this:
cat color coral chat
but also one like this:
cat take boom candle creepy drum cheek
Thank you!
So far I have got this regex but it only matches words when they are in order.
(\w)\w+\s+\1\w+\s+\1\w+\s+\1
If you have only words in the line that can be matched with \w:
\b(\w)\w*(?:(?:\s+\w+)*?\s+\1\w*){3}
Explanation
\b A word boundary to prevent a partial word match
(\w)\w* Capture a single word character in group 1 followed by matching optional word characters
(?: Non capture group to repeat as a whole part
(?:\s+\w+)*? Match 1+ whitespace chars and 1+ word chars in between in case the word does not start with the character captured in the back reference
\s+\1\w* Match 1+ whitespace chars, a backreference to the same captured character and optional word characters
){3} Close the non capture group and repeat 3 times
See a regex demo
Note that \s can also match a newline.
If the words that should with the same character should be at least 2 characters long (as (\w)\w+ matches 2 or more characters)
\b(\w)\w+(?:(?:\s+\w+)*?\s+\1\w+){3}
See another regex demo.
Another idea to match lines with at least 4 words starting with the same letter:
\b(\w)(?:.*?\b\1){3}
See this demo at regex101
This is not very accurate, it just checks if there are three \b word boundaries, each followed by \1 in the first group \b(\w) captured character to the right with .*? any characters in between.

RegExp - find 1,2,3,6,7,8 and 9th letter from the end of the string

I'm new to regular expressions and trying to figure out which expression would match 1,2,3 and 6,7,8,9th letter in the string, starting from the end of the string. It would also need to include \D (for non-digits), so if 3rd letter from the end is a number it will exclude it.
Example of a string is
Wsd-kaf_23psd_trees32rap
So the result should be:
reesrap
or for
Wsd-kaf_23psd_trees324ap
it would be
reesap
This
(?<=^.{9}).*
gives me last 9 chars, but that's not really what I want.
Does anyone knows how can I do that?
Thanks.
You could try to use alternations to find all characters upto the position that holds 9 character untill the end or consecutive digits:
(?:^.*(?=.{9})|\d+)
See an online demo. Replace with empty string.
(?: - Open non-capture group;
^.* - Any 0+ characters (greedy), upto;
(?=.{9}) - A positive lookahead to assert position is followed by 9 characters;
| - Or;
\d+ - 1+ digits.
If, however, your intention was to match the characters seperately, then try:
\D(?=.{0,8}$)
See an online demo. Any non-digit that has 0-8 characters upto the end-line character.

Regex: match only last 2 digits and ignore whitespaces at the end of line

Example:
32-12•
32-12•••
32-12-52••
32-12-53-12
(let's say Bullet Point "•" is Whitespaces)
What I have tried is
/(?<=^.*)\d{2}(?= *)$/gm
but it seem like it does match only last 2 digits that whitespaces doesn't concat like this
32-12•
32-12•••
32-12-52••
32-12-53-12
(let's say bold strings are where regex matched)
but what I want is last 2 digits ignore whitespaces like this
32-12•
32-12•••
32-12-52••
32-12-53-12
You can use
\d{2}(?= *$)
See the regex demo. To match any whitespaces, replace the literal space with as \s shorthand character class: \d{2}(?=\s*$).
Details:
\d{2} - two digits
(?= *$) - a positive lookahead that requires zero or more chars and the end of string position to appear immediately to the right of the current location.

Checking for whitespaces with RegEx

I have strings that look like some text - other text and I need to delete everything before and including the hyphen - and the space after it
But do to typos I might have :
some text -other text or some text- other text or some text-other text or double spaces instead of single spaces
I am using RegEx ^.*\s+\-\s+ and this works for some text - other text with single or multiple spaces before and after the -
But for the other possibilities where the whitespace is missing, I have used two or so I have ^.*\s+\-\s+|.*\-\s|.*\-
Is there a more concise patter that does not use multiple ors for this?
Thank you for any help on this
https://regex101.com/r/TNU7i6/1
Instead of using an alternation with 3 patterns, you might use a pattern to match all except the -, then match the - and optional whitespace chars.
^[^-]*-\s*
Regex demo
If there should be a non whitespace char following, and a lookahead is supported:
^[^-]*-\s*(?=\S)
^ Start of string
[^-]*- Match 0+ times any char except -, then match -
\s* Match optional whitespace chars
(?=\S) Positive lookahead, assert a non whitespace char to the right
Regex demo
Note that \s and the negated character class [^-] can also match a newline.
1st solution: With your shown samples, please try following.
^.*?\s+\S+\s?-\s*(.*)$
OR
^.*?\s+\S+\s*-\s*(.*)$
Online demo for above regex
2nd solution: You could use \K option too to forget matched regex part, in that case try:
^.*?\s+\S+\s?-\s*\K.*$
OR
^.*?\s+\S+\s*-\s*\K.*$
Online demo for above regex
1st solution explanation:
^.*?\s+ ##From starting of value matching till 1st occurrence of space(s).
\S+\s? ##Matching 1 or more non-space occurrences followed by optional space here.
-\s* ##Matching - followed by optional space.
(.*)$ ##Matching everything till last of value.
2nd solution explanation:
^.*?\s+ ##Matching everything till 1st space occurrence(s) from starting of value.
\S+\s? ##Matching non spaces 1 or more occurrences followed by space optional.
-\s*\K ##Matching - followed by spaces(0 or more occurrences) and \K will discard all previous matched values(so that we can match exact values as per output).
.*$ ##Matching everything after previously matched values(which is discarded by \K).

Working with regex for alphanumeric

I'm trying a regex fro Alpha Numeric of length 7 (with positions 1,3,4 as characters and positions 2,5,6,7 as digits).
[a-zA-Z]|[0-9]|[a-zA-Z]|[a-zA-Z]|[0-9]|[0-9]|[0-9]
Can someone help me?
The sequence "character, digit, character, character, digit, digit, digit" is expressed in regex as
[a-zA-Z][0-9][a-zA-Z]{2}[0-9]{3}
If you're working in PCRE (with say, PHP):
^([a-zA-Z])([0-9])(?1){2}(?2){3}$
Breakdown:
^ - from the start of the string
([a-zA-Z]) - match and capture a single character in the ranges given: a-z, A-Z
([0-9]) - match and capture a single character in the ranges given: 0-9
(?1){2} - redo the regex in the first group twice (recursive subpattern)
(?2){3} - redo the regex in the second group 3 times (recursive subpattern)
$ - the end of the string
If you want to match this in the middle of a sentence, exchange ^ and $ for \b - which will match a word boundary
See the demo
If you're not using PCRE:
^[a-zA-Z][0-9][a-zA-Z]{2}[0-9]{3}$
Which does the same thing, but has some copy-paste involved