Regex matching multiple inputs - regex

I am trying to do a smart input field for UK style weight input, e.g. "6 stone and 3 lb" or "6 st 11 pound", capturing the 2 numbers in groups.
For now I got: ([0-9]{1,2}).*?([0-9]{1,2}).*
Problem is it matches "12 stone" in 2 groups, 1 and 2 instead of just 12. Is it possible to make a regex which captures correctly in both cases?

You need to make the first part possessive so it never gets backtracked into.
([0-9]{1,2}+).*?([0-9]{1,2})

Because . matches everythig including numbers.. try this:
/(\d{1,2})\D+(\d{1,2})?/

Something like this?
\b(\d+)\b.*?\b(\d+)\b
Groups 1 and 2 will have your numbers in either case.
Explanation :
"
\b # Assert position at a word boundary
( # Match the regular expression below and capture its match into backreference number 1
\d # Match a single digit 0..9
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\b # Assert position at a word boundary
. # Match any single character that is not a line break character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\b # Assert position at a word boundary
( # Match the regular expression below and capture its match into backreference number 2
\d # Match a single digit 0..9
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\b # Assert position at a word boundary
"

This works, then look at capture groups 1 and 3:
([0-9]{1,2})[^0-9]+(([0-9]{1,2})?.+)?
The idea is to make a number and text manditory, but make a second number and text optional.

Here is my suggestion for a regex to match both variants you showed:
(?<stone>\d+\s(?:stone|st))(?:\s(and)?\s?)(?<pound>\d+\s(?:pound|lb))

It's a bit vague at the moment, this works:
/([0-9]{1,2})(?:[^0-9]+([0-9]{1,2}).*)?/
for this data:
6 stone and 3 lb
6 st 11 pound
12 stone
12 st and 11lbs

Seeing as everyone is having a go, here's mine:
(\d+)(?:\D+(\d+)?)
It's definitely the concisest so far. This will match one or two groups of digits anywhere:
"12": ("12", null)
"12st": ("12", null)
"12 st": ("12", null)
"12st 34 lb": ("12", "34")
"cabbage 12st 34 lb": ("12", "34")
"12 potato 34 moo": ("12", "34")
The next step would be making it catch the name of the units that were used.
Edit: as pointed out above, we don's know what language you're using, and not all regex functionality is available in all implementations. However as far as I know, \d for digits and \D for non-digits is fairly universal.

Related

RegEx to check 24 hours time format fails

I have the following RegEx that is supposed to do 24 hours time format validation, which I'm trying out in https://rubular.com
/^[0-23]{2}:[0-59]{2}:[0-59]{2}$/
But the following times fails to match even if they look correct
02:06:00
04:05:00
Why this is so?
In character classes, you're supposed to denote the range of characters allowed (in contrast to the numbers you want to match in your example). For minutes and seconds, this is relatively straight-forward - the following expression
[0-5][0-9]
...will match any numerical string from "00" to "59".
But for the hours, you need to two separate expressions:
[01][0-9]|2[0-3]
...one to match "00" to "19" and one to match "20" to "23". Due to the alternative used (| character), these need to be grouped, which adds another bit of syntax (?:...). Finally we're just adding the anchors ^ and $ for beginning and end of string, which you already had where they belong.
^(?:[01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]$
You can check this solution out at regex101, if you like.
Your problem is that you understand characters ranges wrong: 0-23 doesn't mean "match any number from 0 to 23", it means: 0-2- match one digit: 0,1 or 2, then match 3.
Try this pattern: (?:[01][0-9]|2[0-3])(?::[0-5][0-9]){2}
Explanation:
(?:...) - non-capturing group
[01][0-9]|2[0-3] - alternation: match whether 0 or one followed by any digits fro 0 to 9 OR 2 followed by 0, 1, 2 or 3 (number from 00-23)
(?::[0-5][0-9]){2} - match : and [0-5][0-9] (basically number from 00-59) twice
Demo
use this (([0-1]\d|[2][0-3])):(([0-5][0-9])):(([0-5][0-9]))
Online demo

use ultraedit find and replace Perl regex to insert colon into 4 digit time string

I have multiple 24-hour time strings through several files. For example, 1234, which I wish to replace with 12:34.
Finding them is easy, just \d\d\d\d, that I understand and it works. However, what replace string do I need. In other words, say xx:xx, what do I put in place of each x.
I've tried numbers of things to no avail. I'm obviously not understanding how I get it to remember the digits it found and to recall them in the replace string.
If in your example data 4 digits represent 24 hour time strings you could match 2 capturing groups between word boundaries to prevent a match with more then 4 digits. You can Adjust the word boundaries to your requirements.
Match
\b(\d{2})(\d{2})\b
Replace
group1:group2 \1:\2
Explanation
\b Match a word boundary
(\d{2}) Capture in a group 2 digits
(\d{2}) Capture in a group 2 digits
\b Match a word boundary
Note
Matching 4 digits does not verify a valid 24 hour time. You could match that using for example \b([01][0-9]|2[0-3])([0-5][0-9])\b and replace with \1:\2

How can I replace this expression in chain regex (notepad++)?

i have this text
14 two 25 three 12 four 40 five 10
I want to obtain "14 two 14 25 three 14 25 12 four 14 25 12 40 five 14 25 12 40 10"
For example, when I replace (14 two ) for (14 two 14 ) this start after of 14 I can't start it after two.
Is there any other alternative to do?
For example using a group that is not included in match ( a group before match ) for replace it ?
please help me
This should do the trick for you:
Regex: ((?:\s?\d+\s?)+)((?:[a-zA-Z](?![^a-zA-Z]+\1))+)
Replacement: $1$2 $1
You will need to click on the "replace all" button for this to work (it cannot be done in one shot, it has to be repeated as long as it can find match. Online PHP example)
Explanation:
\s: Match a single space character
?: the previous expression must be matched 0 or 1 time.
\s?: Match a space character 0 or 1 time.
\d: Match a digit character (the equivalent of [0-9]).
+: The previous expression must be matched at least one time (u to infinite).
\d+: Match as much digit characters as you (but at least one time).
(): Capture group
(?:): Non-capturing group
((?:\s?\d+\s?)+): Match an optional space character followed by one or more digit characters followed by an optional space character. The expression is surrounded by a non-capturing group followed by a plus. That mean that the regex will try to match as much combination of space and digit character as it can (so you can end up with something like '14 25 12 40').
The capture group is meant to keep the value to reuse it in the replacement.You cannot simply add the plus at the end of the capture group without the non-capturing group within because it would only remember the last digits capture ('12' instead of the whole '14 25 12' use to build '14 25 12 40').
[a-zA-Z]: Match any English letters in any case (lower, upper).
\1: reference to what have been capture in the first group.
(?!): Negative lookahead.
[^]: Negative character class, so [^a-zA-Z] means match anything
((?:[a-zA-Z](?![^a-zA-Z]+\1))+): The negative lookahead is meant to make sure that we don't always end up matching the first "14 two" in the input text. Without it, we would end up in an infinite loop giving results as "14 two 14 14 14 14 14 14 25 three 12 four 40 five 10" (the "14" before "25" being repeated until you reach the timeout).
Basically, for every English letter we match, we lookahead to assert that the content of the first capture group (by example "14") is not present in our digit sequence.
For the replacement, $1$2 $1 means put the content of the capture group 1 and 2, add a space and put the content of the capture group 1 once more.

Regular expression to extract the year from a string

This should be simple but i cant seem to get it going. The purpose of it is to extract v3 tags from mp3 file names in Mp3tag.
I have these strings I want to extract the year.
Test String 1 (1994) -> extract 1994
34 Test String 2 (1995)" -> extract 1995
Test (String) 3 (1996)" -> extract 1996
I had ^(.+)\s\(([0-9]*)\)$ but obviously its not giving me the results i was expecting. You can say that im not very good with regular expressions.
Thanks in advance
A suggestion for a more generic solution, not sure if that is what you need. Valid years will always have the form 19xx or 20xx, and the years will be separated with a word-break character (something other than a number or a letter):
\b(19|20)\d{2}\b
This doesn't really care where in the tag the year appears. A simpler version that doesn't assume anything more than 4 digits in the year would be this expression:
\b\d{4}\b
The key here is the \b escape sequence, which matches any non-word character (word charaters are letters, digits and underscores), including parenthesis, of course.
Would also like to recommend this site:
http://www.regular-expressions.info/
You can use something like this \((\d{4})\)$. The first group will have your match.
Explanation
\( # Match the character “(” literally
( # Match the regular expression below and capture its match into backreference number 1
\d # Match a single digit 0..9
{4} # Exactly 4 times
)
\) # Match the character “)” literally
$ # Assert position at the end of a line (at the end of the string or before a line break character)
You need to escape the parentheses. Also you can restrict that a year has only got 4 numbers:
^(.+)\s\(([0-9]{4})\)$
The year is in matchgroup 2.
I'd go with
^(.*)\s\(([0-9]{4})\)$
(assuming all years have 4 digits, use [0-9]+ if you have an unknown number of digits, but at least one, or [0-9]* if there could be no digits)
You're almost there with your regular expression.
What you really need is:
\s\((\d{4})\)$
Where:
\s is some whitespace
\( is a literal '('
( is the start of the match group
\d is a digit
{4} means four of the previous atom (i.e. four digits)
) is the end of the match group
\) is a literal ')'
$ is the end of the string
For best results, put into a function:
>>> def get_year(name):
... return re.search('\s\((\d{4})\)$', name).groups()[0]
...
>>> for name in "Test String 1 (1994)", "34 Test String 2 (1995)", "Test (String) 3 (1996)":
... print get_year(name)
...
1994
1995
1996

Regex for password that requires one numeric or one non-alphanumeric character

I'm looking for a rather specific regex and I almost have it but not quite.
I want a regex that will require at least 5 charactors, where at least one of those characters is either a numeric value or a nonalphanumeric character.
This is what I have so far:
^(?=.*[\d]|[!##$%\^*()_\-+=\[{\]};:|\./])(?=.*[a-z]).{5,20}$
So the problem is the "or" part. It will allow non-alphanumeric values, but still requires at least one numeric value. You can see that I have the or operator "|" between my require numerics and the non-alphanumeric, but that doesn't seem to work.
Any suggestions would be great.
Try:
^(?=.*(\d|\W)).{5,20}$
A short explanation:
^ # match the beginning of the input
(?= # start positive look ahead
.* # match any character except line breaks and repeat it zero or more times
( # start capture group 1
\d # match a digit: [0-9]
| # OR
\W # match a non-word character: [^\w]
) # end capture group 1
) # end positive look ahead
.{5,20} # match any character except line breaks and repeat it between 5 and 20 times
$ # match the end of the input
Perhaps this may work for you:
^.*[\d\W]+.*$
And use some code like this to check string size:
if(str.len >= 5 && str.len =< 20 && regex.ismatch(str, "^.*[\d\W]+.*$")) { ... }
Is it really necessary to stuff everything in a giant regex? Just use program logic (5 ≤ length(s) ≤ 20) ∧ (/[[:digit:]]/ ∨ /[^[:alpha:]]/). Far more readable syntactically and semantically, I think.
Pretty simple solution, once S.Mark got me on the right track, just needed to merge my numeric and non-alphanumeric pieces as one.
Here's the final regex for anyone that's interested:
^(?=.*[\d!##$%\^*()_\-+=\[{\]};:|\./])(?=.*[a-z]).{5,20}$
This will allow any password between 5 and 20 characters and requires at least one letter and one numeric and/or one non-alphanumeric character.
How about like this?
^.*?[\d!##$%\^*()_\-+=\[{\]};:|\./].*$
For the length 5,20 Please use normal strlen function