Regex : Match everything after first dash - regex

I have a string which contains the rego number of the car like
1FX9JE - 2012 Audi A3 Ambition Sportback MY12 Stronic
I would like to match everything except the rego number, so anything after the dash.
The regex I came up with is (php)
\s.[^-]*$
My initial regex which i came up can match anything after the dash only if the string contains only 1 dash. For example https://regex101.com/r/Jao8W0/1
However, if the string has more than 1 dash. The regex is not usable.
For example : https://regex101.com/r/Jao8W0/2
Is there anyway for me to match anything after the first dash even though the string contains additional dash after the first dash.
Thank you

Try this Regex:
^[^-\r\n]+-\s*\K.*$
Click for Demo
Explanation:
^ - asserts the start of the string
[^-\r\n]+ - matches 1+ occurrences of any character that is neither a - or nor a newline
-\s* - matches the first - in the string followed by 0+ whitespaces
\K - forgets everything matched so far
.* - matches 0+ occurrences of any character
$ - asserts the end of the string

if only has one space, you can use this pattern:
(?<=\-\s)(.*)
else if there may have more than one space, get the group(1) from match
(?<=\-)\s*(.*)
(?<=...) Ensures that the given pattern will match, ending at the
current position in the expression. The pattern must have a fixed
width. Does not consume any characters.

Related

Regex: match only last 2 digits and ignore whitespaces at the end of line

Example:
32-12•
32-12•••
32-12-52••
32-12-53-12
(let's say Bullet Point "•" is Whitespaces)
What I have tried is
/(?<=^.*)\d{2}(?= *)$/gm
but it seem like it does match only last 2 digits that whitespaces doesn't concat like this
32-12•
32-12•••
32-12-52••
32-12-53-12
(let's say bold strings are where regex matched)
but what I want is last 2 digits ignore whitespaces like this
32-12•
32-12•••
32-12-52••
32-12-53-12
You can use
\d{2}(?= *$)
See the regex demo. To match any whitespaces, replace the literal space with as \s shorthand character class: \d{2}(?=\s*$).
Details:
\d{2} - two digits
(?= *$) - a positive lookahead that requires zero or more chars and the end of string position to appear immediately to the right of the current location.

Regex match string 3-6 characters long, at least one letter, no duplicate "-"

I have to match a string that is 3-6 characters long, contains at least one letter, but can have letters, numbers and only 1 "-".
The "-" must not be at the start or at the beginning.
Match:
string
str-ng
st-ng
s1-1g
st-1g
Do not match:
strings
-string
string-
st--ng
s-tn-g
1111
st
The closest I've gotten is this:
^((?!-.*-)[0-9A-Z]{3,6})$
But this divides the regex match with - So it matches s-tri but not st-ri because there aren't 3 chars at each end
Maybe you can use:
^(?=.*[a-z])(?!-|.*-$|.*-.*-)[a-z\d-]{3,6}$
See the online demo
^ - Start string anchor.
(?=.*[a-z]) - Positive lookahead to make sure there is at least one letter.
(?!-|.*-$|.*-.*-) - Negative lookahead to prevent a hyphen at the beginning or at the end or multiple.
[a-z\d-]{3,6} - Three to six times a character from the give class.
$ - End string anchor.
Note that I used the case-insensitive flag.
You can use
^(?=.{3,6}$)(?=[^a-zA-Z]*[A-Za-z])[0-9a-zA-Z]+(?:-[0-9a-zA-Z]+)?$
See the regex demo. Details:
^ - start of string
(?=.{3,6}$) - string must contain three to six chars other than line break chars
(?=[^a-zA-Z]*[A-Za-z]) - there must be at least one ASCII letter in the string
[0-9a-zA-Z]+ - one or more alphanumeric ASCII chars
(?:-[0-9a-zA-Z]+)? - an optional sequence of - and then one or more alphanumeric ASCII chars
$ - end of string.
Looking at the pattern that you tried, you meant to exclude the match when there are 2 hyphens present using the negative lookahead.
Also this part [0-9A-Z]{3,6} does not match a hyphen.
Reading
The "-" must not be at the start or at the beginning.
You might do that using
^(?![^\n-]*-[^\n-]*-)(?=[^a-zA-Z\n]*[a-zA-Z])[a-zA-Z0-9][a-zA-Z0-9-]{2,5}$
Regex demo
If you meant also no - at the end:
^(?![^\n-]*-[^\n-]*-)(?=[^a-zA-Z\n]*[a-zA-Z])[a-zA-Z0-9][a-zA-Z0-9-]{1,4}[a-zA-Z0-9]$
Explanation
^ Start of string
(?![^\n-]*-[^\n-]*-) Assert not 2 times -
(?=[^a-zA-Z\n]*[a-zA-Z]) Assert a char a-zA-Z
[a-zA-Z0-9] Match One of the listed without -
[a-zA-Z0-9-]{1,4} Repeat 1-4 times any of the listed including -
[a-zA-Z0-9] Match One of the listed without -
$ End of string
Regex demo

A regex for letters and space that cannot be a whitespace

I cannot figure out how to add two regex together, I have these requirements:
Letters and space ^[\p{L} ]+$
Cannot be whitespace ^[^\s]+$
I cannot figure out how to write one regex that will combine both? There is perhaps some other solution?
You may use
^(?! +$)[\p{L} ]+$
^(?!\s+$)[\p{L}\s]+$
^\s*\p{L}[\p{L}\s]*$
Details
^ - start of string
(?!\s+$) - no 1 or more whitespaces are allowed till the end of the string
[\p{L}\s]+ - 1+ letters or whitespaces
$ - end of string.
See the regex demo.
The ^\s*\p{L}[\p{L}\s]*$ is a regex that matches any 0+ whitespaces at the start of the string, then requires a letter that it consumes, and then any 0+ letters/whitespaces may follow.
See the regex demo.

Finding words in a string that start with number (Regex)

I need to find words in a string that start with number(i.e digit)
In following string:
1st 2nd 3rd a56b 5th 6th ***7th
The words 1st 2nd 3rd 5th 6th should be returned.
I tried with the regex:
(\b[^ a-zA-Z ^ *]+(th|rd|st|nd))+
But this regex returns the words not starting with alphabets but can't handle the cases when word starts with special characters.
For the current string, you may use a pattern like
(?<!\S)\d+(?:th|rd|st|nd)\b
See the regex demo
The pattern matches:
(?<!\S) - a location at the start of a string or after a whitespace
\d+ - 1 or more digits
(?:th|rd|st|nd) - one of the four alternatives
\b - a word boundary.
If you plan to match any 0+ non-whitespace chars after a digit that is preceded with a whitespace or is at the start of a string, use
(?<!\S)\d\S*
where \S* will match any 0+ non-whitespace chars.
See this regex demo.
NOTE: In case the lookbehind is not supported, replace (?<!\S) with (?:^|\s) and also wrap the rest of the pattern with a capturing group to access the latter later:
(?:^|\s)(\d\S*)
and the value will be in Group 1.
To get word which is starting with number/digit and ending with th/st/nd/rd you can try this.
((?<!\S)(\d+)(th|rd|nd|st))
(?<!\S) detects the word's starting position
\d+ matches 1 or more digits
th|rd|st|nd matches one among those 4.
You can check it here

Matching if all of BCD..n exist after last occurrence of A

I have a source string that looks like this: mID00231mID00008mID00231mID00054mID00013mID00008mID00065
The pattern I am trying to create, using this example, is: For the last occurrence of "mID00231" in the string, one or more occurrences of each of {mID00054, mID00013, mID00008, mID00065} must follow it (in any order).
Examples of matches:
mID00231mID00008mID00231mID00054mID00013mID00008mID00065
mID00231mID00013mID00054mID00008mID00065mID00008
Example of no match because of missing "mID00065":
mID00231mID00054mID00013mID00008
Example of no match because the last occurrence of "mID00231" is not followed by a "mID00054" and a "mID00008":
mID00231mID00013mID00065mID00054mID00008mID00231mID00013mID00065
I am fairly new to regex but usually arrive at something that works. This one has been very difficult. I tried this:
(?:mID00231)(?:(?=.*mID00054)(?=.*mID00013)(?=.*mID00008)(?=.*mID00065).*)
It works if there is only one occurrence of the first element (mID00231). If the element repeats, the pattern fails. Any help is appreciated.
You need to fail the match if there is the same value with a negative lookahead:
mID00231((?!.*mID00231)(?=.*mID00054)(?=.*mID00013)(?=.*mID00008)(?=.*mID00065).*)
^^^^^^^^^^^^^^
See the regex demo.
Details:
mID00231 - match a literal mID00231 text
( - start of the capturing group
(?!.*mID00231) - there cannot be mID00231 anywhere after 0+ any chars but a newline
(?=.*mID00054) - there must be mID00054 anywhere after 0+ any chars but a newline
(?=.*mID00013) - there must be mID00013 anywhere after 0+ any chars but a newline
(?=.*mID00008) - there must be mID00008 anywhere after 0+ any chars but a newline
(?=.*mID00065) - there must be mID00065 anywhere after 0+ any chars but a newline
.* - 0+ any chars but a newline
) - end of the capturing group.