match only number combinations with a dot at the end with regex - regex

I need to match only section numbers in text with a dot at the end. For example, having a string:
'A.8. 8.4.2.4.1.2. 9.1. 9. 10.0.1.1. 9 0.1 100. 100.5. A.500'
What I want to match: [A.8., 8.4.2.4.1.2., 9.1., 9., 10.0.1.1., 100., 100.5.]
What I have matched: [A.8., 8.4.2.4.1.2., 9.1., 10.0.1.1., 100.5.]
My regex is (?:\d+|A)\.[\d+\.]*\.
Those numbers, without the dot at the end are not matched, which is correct. However, singular numbers with a dot should be matched but are not (such as '9.' and '100.')
How can I update my regex to make it work?

You may use
\b(?:\d+|A)(?:\.\d+)*\.\B
See the regex demo.
If the matches are whitespace-separated you can also use (granted the regex engine supports lookbehinds):
(?<!\S)(?:\d+|A)(?:\.\d+)*\.(?!\S)
See the regex demo.
Details:
\b - a word boundary (start of string or a non-word char should immediately precede the next digit or A)
(?:\d+|A) - one or more digits, or an A
(?:\.\d+)* - zero or more repetitions of a dot and then one or more digits
\. - a dot
\B - a position other than a word boundary (end of string or a non-word char should follow immediately)
The (?<!\S) / (?!\S) require a whitespace or start/end of string positions on both ends of the match.

Related

RegExp - find 1,2,3,6,7,8 and 9th letter from the end of the string

I'm new to regular expressions and trying to figure out which expression would match 1,2,3 and 6,7,8,9th letter in the string, starting from the end of the string. It would also need to include \D (for non-digits), so if 3rd letter from the end is a number it will exclude it.
Example of a string is
Wsd-kaf_23psd_trees32rap
So the result should be:
reesrap
or for
Wsd-kaf_23psd_trees324ap
it would be
reesap
This
(?<=^.{9}).*
gives me last 9 chars, but that's not really what I want.
Does anyone knows how can I do that?
Thanks.
You could try to use alternations to find all characters upto the position that holds 9 character untill the end or consecutive digits:
(?:^.*(?=.{9})|\d+)
See an online demo. Replace with empty string.
(?: - Open non-capture group;
^.* - Any 0+ characters (greedy), upto;
(?=.{9}) - A positive lookahead to assert position is followed by 9 characters;
| - Or;
\d+ - 1+ digits.
If, however, your intention was to match the characters seperately, then try:
\D(?=.{0,8}$)
See an online demo. Any non-digit that has 0-8 characters upto the end-line character.

Regex for extracting digits in a string not in a word and not separated by a symbol?

I want to extract an ID from a search query but I don't know the length of the ID.
From this input I want to get the numbers that are not in the words and the numbers that are not separated by symbols.
12 11231390 good123e41 12he12o1 1391389 dajue1290a 12331 12-10 1.2 test12.0why 12+12 12*6 2d1139013 09`29 83919 1
Here I want to return
12 11231390 1391389 12331 83919 1
So far I've tried /\b[^\D]\d*[^\D]\b/gm but I get the numbers in between the symbols and I don't get the 1 at the end.
You could repeatedly match digits between whitespace boundaries. Using a word boundary \b would give you partial matches.
Note that [^\D] is the same as \d and would expect at least a single character.
Your pattern can be written as \b\d\d*\d\b and you can see that you don't get the 1 at the end as your pattern matches at least 2 digits.
(?<!\S)\d+(?:\s+\d+)*(?!\S)
The pattern matches:
(?<!\S) Negateive lookbehind, assert a whitespace boundary to the left
\d+(?:\s+\d+)* Match 1+ digits and optionally repeat matching 1+ whitespace chars and 1+ digits.
(?!\S) Negative lookahead, assert a whitspace boundary to the right
Regex demo
If lookarounds are not supported, you could use a match with a capture group
(?:^|\s)(\d+(?:\s+\d+)*)(?:$|\s)
Regex demo

Regex match checksum with or without dashes

To match a dash-less checksum I can do something like:
\b[0-9a-z]{32}\b
However, I'm seeing some checksums that also have dashes, such as:
d3bd55bf-062f-473b-9417-935f62c4c98a
While this is probably a fixed size, 8, then 4, then 4, then 4, then 12, I was wondering if I could do a regex where the number of non-dash digits adds up to 32. I think the answer is no, but hopefully some regex wizard can come up with something.
Here is a starting point for some sample inputs: https://regex101.com/r/K0IMKe/1.
You can use
\b[0-9a-z](?:-?[0-9a-z]){31}\b
See the regex demo.
It matches
\b - a word boundary
[0-9a-z] - a digit or a lowercase ASCII letter
(?:-?[0-9a-z]){31} - thirty-one repetitions of an optional - followed with a single digit or a lowercase ASCII letter
\b - a word boundary.
If you do not mind having a trailing - if there is a word char after it, at the end of a match, you may also use
\b(?:[0-9a-z]-?){32}\b
See this regex demo. Here, (?:[0-9a-z]-?){32} will match thirty-two repetitions of a digit or lowercase ASCII letter followed with an optional hyphen.
If there can be multiple dashes, you can assert 32 to 36 chars using a positive lookahead.
^(?=[a-z0-9-]{32,36}$)[a-z0-9]+(?:-[a-z0-9]+)*$
^ Start of string
(?=[a-z0-9-]{32,36}$) Positive lookahead, assert what is at the right is 32 - 36 repetitions of the listed characters
[a-z0-9]+ Match 1+ times any of the listed
(?: Non capture group
-[a-z0-9]+ Match a - followed by 1+ times any of the listed (the string can not end with a hyphen)
)* Close the group and match 0+ times to also match the string without dashes
$ End of string
Regex demo
If you want to limit the amount of dashes to 0 -4 times, you can change the quantifier * to {0,4}+
^(?=[a-z0-9-]{32,36}$)[a-z0-9]+(?:-[a-z0-9]+){0,4}+$
Regex demo

Finding words in a string that start with number (Regex)

I need to find words in a string that start with number(i.e digit)
In following string:
1st 2nd 3rd a56b 5th 6th ***7th
The words 1st 2nd 3rd 5th 6th should be returned.
I tried with the regex:
(\b[^ a-zA-Z ^ *]+(th|rd|st|nd))+
But this regex returns the words not starting with alphabets but can't handle the cases when word starts with special characters.
For the current string, you may use a pattern like
(?<!\S)\d+(?:th|rd|st|nd)\b
See the regex demo
The pattern matches:
(?<!\S) - a location at the start of a string or after a whitespace
\d+ - 1 or more digits
(?:th|rd|st|nd) - one of the four alternatives
\b - a word boundary.
If you plan to match any 0+ non-whitespace chars after a digit that is preceded with a whitespace or is at the start of a string, use
(?<!\S)\d\S*
where \S* will match any 0+ non-whitespace chars.
See this regex demo.
NOTE: In case the lookbehind is not supported, replace (?<!\S) with (?:^|\s) and also wrap the rest of the pattern with a capturing group to access the latter later:
(?:^|\s)(\d\S*)
and the value will be in Group 1.
To get word which is starting with number/digit and ending with th/st/nd/rd you can try this.
((?<!\S)(\d+)(th|rd|nd|st))
(?<!\S) detects the word's starting position
\d+ matches 1 or more digits
th|rd|st|nd matches one among those 4.
You can check it here

Working with regex for alphanumeric

I'm trying a regex fro Alpha Numeric of length 7 (with positions 1,3,4 as characters and positions 2,5,6,7 as digits).
[a-zA-Z]|[0-9]|[a-zA-Z]|[a-zA-Z]|[0-9]|[0-9]|[0-9]
Can someone help me?
The sequence "character, digit, character, character, digit, digit, digit" is expressed in regex as
[a-zA-Z][0-9][a-zA-Z]{2}[0-9]{3}
If you're working in PCRE (with say, PHP):
^([a-zA-Z])([0-9])(?1){2}(?2){3}$
Breakdown:
^ - from the start of the string
([a-zA-Z]) - match and capture a single character in the ranges given: a-z, A-Z
([0-9]) - match and capture a single character in the ranges given: 0-9
(?1){2} - redo the regex in the first group twice (recursive subpattern)
(?2){3} - redo the regex in the second group 3 times (recursive subpattern)
$ - the end of the string
If you want to match this in the middle of a sentence, exchange ^ and $ for \b - which will match a word boundary
See the demo
If you're not using PCRE:
^[a-zA-Z][0-9][a-zA-Z]{2}[0-9]{3}$
Which does the same thing, but has some copy-paste involved