match groups of words and digits regex in any order - regex

suppose you have the following string:
"7 apples and 13 oranges"
/(\d+).*?(apples)/i
the above regex will match 7 apples but if you alternate the order and numbers to "45 oranges and 9 apples".it will match the first digit 45 rather than the digit corresponding to apples, which I want.
How can I write a regex to match and return match groups of digits + apples if I write the sentence in the following two orders:
"7 apples and 13 oranges"
"13 oranges 52 apples"
ie, I'd like to match 7 apples, with the match groups of 7 and apples AND 52 apples with the match groups 52 and apples.

Where you got wrong in /(\d+).*?(apples)/i ?
.*? even though it is a lazy matching it matches from the digit to next apple
which means that for string
"13 oranges 52 apples"
It matches from 13 till the apple at the end of the string, since . matches anything
see the link for an illustration : http://regex101.com/r/uL5eX0/2
How to correct?
since the symbol seperating your digit and apple is a space, you can use a \s character instead of . as
(\d+)\s(apples)
matches 7 and 52 as seen in http://regex101.com/r/uL5eX0/3
For safe side you can have
(\d+)\s+(apples)
any number of spaces between digit and apple
a word boundary \b can also be used for extra safety

(\d+)(?=\s*(apples))
Try this.Use a postive lookahead.See demo.
http://regex101.com/r/yG7zB9/17

use this pattern
(\d+)\s++(apples\b)
by popular demand from the crowed.
(\d+)\s+(apples\b)
Demo

You could simply add \D*? instead of .*? where . would match the in-between digit but \D wouldn't.
(\d+)\D*?(apples)
\D*? Non-greedy match of any character but not of a digit zero or more times.
DEMO
What's wrong with your regex?
(\d+).*?(apples)
At first regex engine would try to match characters which satisfy the given pattern from left to right. So \d+ would match the first number and .*?(apples) forces the engine to match all the characters upto the string apple. Use \D*? instead of .*? to force the engine to match any character but not of a digit zero or more digits.

Related

regex to extract housenumber plus addition

I'm looking for a regex that matches housenumbers combined with additions for all addresses below:
Breestraat 4
Breestraat 45
Breestraat 456
Dubbele Straat 4a
Dubbele Straat 4-a
5 meistraat 1a
5meistraat 12
5meistraat 12a
Teststraat 22-III
Now the following regex works, except in the first case. This is because the single digit housenummber is missed because of the first \d in the regex (which prevents a starting digit to be captured).
\d?.(\d+.+)$
regex to extract housenumber addition
I'm scratching my head how to get the housenumer '4' for the first line. so basically how to change the "skip starting digit" to "skip starting digit but let it have to result on the capturing group".
You can use
\d+\D*$
\d+\S*$
See the regex demo #1 and regex demo #2.
The pattern matches
\d+ - one or more digits
\D* - zero or more non-digit chars
\S* - zero or more non-whitespace chars
$ - end of string.
It's not perfectly clear what you are requesting precisely..
Anyway this is the pattern matching the house number at the end of the string:
\d+[-\da-zI]*$
https://regexr.com/6l0g7
Anyway I'm aware this is not a valid answer

Regex: Identify second character in a string should be equal to number(9)

Am looking to identify the second digit that should be matching to digit 9 in a length of 19 numbers string, how to do that using Regex, please let me know.
Ex: 8934567890098765438
dentify the starting 2nd character should match 9 digit only and length of the string should be greater than 18
I have tried (?!^.[9])[0-9]{18}, [^.[9][0-9]{18} different ways, but am getting the right one.
You can use a capture group to capture the 9 digit, and match 17 or more chars after it to match at least 19 digits in total:
^\d(\9)\d{17,}$
Regex demo
Or using a positive lookbehind, matching only a 9:
(?<=^\d)9(?=\d{17,}$)
Regex demo
^(?=.9)\d{19,}$
^(?=.9) : Second character is 9.
\d{19,}$: 19 or more digits
Use
^\d9\d{17}$
See regex proof.
With lookahead:
^(?=\d9)\d{19}$
See another regex proof.
When matching a substring of a longer string:
\b\d9\d{17}\b
(\b is a word boundary).

How to group expressions to be matched as one?

What i am trying to match is like this :
char-char-int-int-int
char-char-char-int-int-int
char-char-int-int-int-optionnalValue (optionalValue being a "-" plus letters after it
My current regep looks like this :
([A-Za-z]{1,2})([1-9]{3})("-"[\w])
In the end, the regexp should match any of these:
AB001
aB999
Hm000
en789
rv005-ab
These should be invalid:
ab (because only letters)
abcfr (because too much letters)
158 (because only numbers)
78532 (because too much numbers)
123ab (because all letters should come before numbers, optionalValue exepted)
a1b23 (because letters and numbers are mixed)
What am i doing wrong ? (please be gentle this is my first post ever on stackoverflow)
If you use [A-Za-z]{1,2} then the second example would not match as there a 3 char-char-char
Using \w would also match numbers and an underscore. If you mean letters like a-zA-Z you can use that in an optional group preceded by a hyphen (?:-[a-zA-Z]+)?
You could use
^[a-zA-Z]{2,3}[0-9]{3}(?:-[a-zA-Z]+)?$
^ Start of string
[a-zA-Z]{2,3} Match 2 or 3 times a char A-Za-z
[0-9]{3} Match 3 digits
(?:-[a-zA-Z]+)? Optionally match a - and 1 or more chars A-Za-z
$ End of string
Regex demo
Or using word boundaries \b instead of anchors
\b[a-zA-Z]{2,3}[0-9]{3}(?:-[a-zA-Z]+)?\b
Regex demo
I have corrected your regex below. Please give it a try.
([A-Za-z]{1,2})([0-9]{3})(-\w*)?
Demo

vba regular expression last occurrence

I would like to match the "775" (representing the last 3 digit number with an unkown total number of occurrences) within the string "one 234 two 449 three 775 f4our" , with "f4our" representing an unknown number of characters (letters, digits, spaces, but not 3 or more digits in a row).
I came up with the regular expression "(\d{3}).*?$" thinking the "?" would suffice to get the 775 instead of the 234, but this doesn't seem to work.
Is there any way to accomplish this using VBA regular expressions?
Note that (\d{3}).*?$ just matches and captures into Group 1 the first 3 consecutive digits and then matches any 0+ characters other than a newline up to the end of the string.
You need to get the 3 digit chunk at the end of the string that is not followed with a 3-digit chunk anywhere after it.
You may use a negative lookahead (?!.*\d{3}) to impose a restriction on the match:
\d{3}(?!.*\d{3})
See the regex demo. Or - if the 3 digits are to be matched as whole word:
\b\d{3}\b(?!.*\b\d{3}\b)
See another demo

Reg Exp: match specific number of characters or digits

My RegExp is very rusty! I have two questions, related to the following RegExp
Question Part 1
I'm trying to get the following RegExp to work
^.*\d{1}\.{1}\d{1}[A-Z]{5}.*$
What I'm trying to pass is x1.1SMITHx or x1.1.JONESx
Where x can be anything of any length but the SMITH or JONES part of the input string is checked for 5 upper case characters only
So:
some preamble 1.1SMITH some more characters 123
xyz1.1JONES some more characters 123
both pass
But
another bit of string1.1SMITHABC some more characters 123
xyz1.1ME some more characters 123
Should not pass because SMITH now contains 3 additional characters, ABC, and ME is only 2 characters.
I only pass if after 1.1 there are 5 characters only
Question Part 2
How do I match on specific number of digits ?
Not bothered what they are, it's the number of them that I can't get working
if I use ^\d{1}$ I'd have thought it'll only pass if one digit is present
It will pass 5 but it also passes 67
It should fail 67 as it's two digits in length.
The RegExp should pass only if 1 digit is present.
For the first one, check out this regex:
^.*\d\.\d[A-Z]{5}[^A-Z]*$
Before solving the problem, I made it easier to read by removing all of the {1}. This is an unnecessary qualifier since regex will default to looking for one character (/abc/ matches abc not aaabbbccc).
To fix the issue, we just need to replace your final .*. This says match 0+ characters of anything. If we make this "dot-match-all" more specific (i.e. [^A-Z]), you won't match SMITHABC.
I came up with a number of solution but I like these most. If your RegEx engine supports negative look-ahead and negative look-behind, you can use this:
Part 1: (?<![A-Z])[A-Z]{5}(?![A-Z])
Part 2: (?<!\d)\d(?!\d)
Both have a pattern of (?<!expr)expr(?!expr).
(?<!...) is a negative look-behind, meaning the match isn't preceded by the expression in the bracket.
(?!...) is a negative look-ahead, meaning the match isn't followed by the expression in the bracket.
So: for the first pattern, it means "find 5 uppercase characters that are neither preceded nor followed by another uppercase character". In other words, match exactly 5 uppercase characters.
The second pattern works the same way: find a digit that is not preceded or followed by another digit.
You can try it on Regex 101.