How to get the first match in regexp?

How to get the first match in regexp? - regex

I have three strings as list below:
Levofloxacin 500mg/100mL
Levofloxacin 500mg
Procaterol Hydrochloride …………… 25μg
The first line, I want to just get 'mg' without 'mL' in my result.
The second line, I want get 'mg'.
The third line, I want get 'ug'.
I have try regexp pattern like:
(?!(.*[ ]{1}[0-9]+))[a-zA-Zμ]+
However, the first line always returns 'mg' with 'mL'...
How could I just acquire 'mg' with regexp?
Any suggestions will be appreciated.

As mentioned in the comment section, try this regex:
^\D*[\d.]+\K[a-zμ]+
Click for Demo
Explanation:
^ - asserts the start of the string
\D* - matches 0+ occurrences of any character that is not a digit
[\d.]+ - matches 1+ occurrences of any character that is a digit
\K - removes what has been matched so far
[a-zμ]+ - this is what you want. This will contain the units like mg, ml appearing after the first number. If there are any other special characters like μ, you can add them too in this character list

Related

How to extract a word that could possibly be followed with another word

I want to extract [games, games, things, things] from
the following array.
Today_games
Today_games_freq
Today_things
Today_things_freq
I have tried Today_(\w+)(?=_freq)?
Which will give me the extra "freq"
And some other combinations, but I couldn't figure out how to get just after the first hyphen.

You can use
Today_(\w+?)(?:_freq)?$
See the regex demo. This matches Today_, then captures any one or more word chars (as few as possible) into Group 1 (with (\w+?)), and then (?:_freq)?$ matches an optional occurrence of a _freq substring and asserts the position at the end of string.
Or,
Today_([^\W_]+)
See this regex demo.
Here, Today_ is matched and the ([^\W_]+) pattern captures one or more alphanumeric chars into Group 1 (same as \w+ with _ subtracted from \w).

Regex: match between last occurrence of a character and first occurrence of another character

I have the following string:
COUNTRY/CITY/street_number_floor_tel
I'd like to extract what is after the last occurrence of '/' and first occurrence of '_'. So the result is:
street
So far I've managed to come up with this regex:
[^/]+(?=_)
which results in the following:
street_number_floor
So I basically don't know how to stop after finding the first occurrence of '_'.
Many thanks for any hints in advance!

You may use
(?<=/)[^/_]+(?=_[^/]*$)
See the regex demo
Details
(?<=/) - a positive lookbehind requiring a / immediately before the match
[^/_]+ - 1+ chars other than / and _
(?=_[^/]*$) - a positive lookahead that requires _, then 0+ chars other than / till the end of string immediately to the right of the current location.

Regex match for multiple characters

I want to write a regex pattern to match a string starting with "Z" and not containing the next 2 characters as "IU" followed by any other characters.
I am using this pattern but it is not working Z[^(IU)]+.*$
ZISADR - should match
ZIUSADR - should not match
ZDDDDR - should match

Try this regex:
^Z(?:I[^U]|[^I]).*$
Click for Demo
Explanation:
^ - asserts the start of the line
Z - matches Z
I[^U] - matches I followed by any character that is not a U
| - OR
[^I] - matches any character that is not a I
.* - matches 0+ occurrences of any character that is not a new line
$ - asserts the end of the line

When you want to negate certain characters in a string, you can use character class but when you want to negate more than one character in a particular sequence, you need to use negative look ahead and write your regex like this,
^Z(?!IU).*$
Demo
Also note, your first word ZISADR will match as Z is not followed by IU
Your regex, Z[^(IU)]+.*$ will match the starting with Z and [^(IU)]+ character class will match any character other than ( I U and ) one or more times further followed by .* means it will match any characters zero or more times which is not the behavior you wanted.
Edit: To provide a solution without look ahead
A non-lookahead based solution would be to use this regex,
^Z(?:I[^U]|[^I]U|[^I][^U]).*$
This regex has three main alternations which incorporate all cases needed to cover.
I[^U] - Ensures if second character is I then third shouldn't be U
[^I]U - Ensures if third character is U then second shouldn't be I
[^I][^U] - Ensures that both second and third characters shouldn't be I and U altogether.
Demo non-look ahead based solution

Regular expression to match dot which has letter after it, before next dot or end of line

I want a regular expression which will match a dot . which has a letter after it at some point before the next dot . or end of line.
For example the following would be valid: .foo.bar.
.foo.123 would be invalid because it contains .123 which has no letters after the dot.
So far I've got:
^([a-z0-9)]|\.(?=.*[a-z].*\.))+$
I understand that the problem with the above is the final match for a . in the positive lookahead: it will always fail to match. I think something like "if dot exists match, else match end of line". If I use ($|\.) in place of the final match this still doesn't work, I assume because it tries both even when a . is matched.
I'd like to avoid using look-behinds. I want match the whole string, not just the dots.

This regex possibly with some small changes should work. ^(?:\.[^\.\s]*[a-zA-Z][^\.\s]*)+$
Regex101 demo.
Breakdown of how it works:
^ - Start of new line
(?:\.[^\.\s]*[a-zA-Z][^\.\s]*) - Grab period followed by all text before the next period or new line. Ensure there is at least one letter.
\. - Start with period.
[^\.\s]* - Anything but a space or . any number of times.
[a-zA-Z] - Ensure at least one letter per period.
[^\.\s]* - Anything but a space or . any number of times.
+ - Once or more
$ - End of line

Regex for finding words with no or only one word between them

I need to find into multiple strings two words with no words or only one word between them. I created the regex for the case to find if those two words exist in string:
^(?=[\s\S]*\bFirst\b)(?=[\s\S]*\bSecond\b)[\s\S]+
and it works correctly.
Then I tried to insert in this regex additional code:
^(?=[\s\S]*\bFirst\b)(\b\w+\b){0,1}(?=[\s\S]*\bSecond\b)[\s\S]+
but it didn't work. It selects text with two or more words between searched words. It is not what I need.
First Second - must be selected
First word1 Second - must be selected
First word1 word2 Second - must be not selected by regex, but my regex select it.
Can I get advise how to solve this problem?

Root cause
You should bear in mind that lookarounds match strings without moving along the string, they "stand their ground". Once you write ^(?=[\s\S]*\bFirst\b)(\b\w+\b){0,1}(?=[\s\S]*\bSecond\b), the execution is as follows:
^ - the regex engine checks if the current position is the start of string
(?=[\s\S]*\bFirst\b) - the positive lookahead requires the presence of any 0+ chars followed with a whole word First - note that the regex index is still at the start of the string after the lookahead returns true or false
(\b\w+\b){0,1} - this subpattern is checked only if the above check was true (i.e. there is a whole word First somewhere) and matches (consumes, moves the regex index) 1 or 0 occurrences of a whole word (i.e. there must be 1 or more word chars right at the string start
(?=[\s\S]*\bSecond\b) - another positive lookahead that makes sure there is a whole word Second somewhere after the first whole word consumed with \b\w+\b - if any. Even if the word Second is the first word in the string, this will return true since backtracking will step back the word matched with (\b\w+\b){0,1} (see, it is optional), and the Second will get asserted, and [\s\S]+ will grab the whole string (Group 1 will be empty). See the regex demo with Second word word2 First string.
So, your approach cannot guarantee the order of First and Second in the string, they are just required to be present but not necessarily in the order you expect.
Solution
If you need to check the order of First and Second in the string, you need to combine all the checks into one single lookahead. The approach might turn out very inefficient with longer strings and multiple alternatives in the lookaround, consider either unrolling the patterns, or trying mutliple regex patterns (like this pseudo-code if /\bFirst\b/.finds_match().index < /\bSecond\b/.finds_match().index => Good, go on...).
If you plan to go on with the regex approach, you may match a string that contains First....Second only in this order:
^(?=[\s\S]*\bFirst(?:\W+\w+)?\W+Second\b)[\s\S]+
See the regex demo
Details:
^ - start of string
(?=[\s\S]*\bFirst(?:\W+\w+)?\W+Second\b) - there must be:
[\s\S]* - any zero or more chars up to the last
\bFirst - whole word First
(?:\W+\w+)? - optional sequence (1 or 0 occurrences) of 1+ non-word chars and 1+ word chars
\W+ - 1+ non-word chars
Second\b - Second as a whole word
[\s\S]+ - any 1 or more characters (empty string won't match).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to get the first match in regexp? - regex

Related

How to extract a word that could possibly be followed with another word

Regex: match between last occurrence of a character and first occurrence of another character

Regex match for multiple characters

Regular expression to match dot which has letter after it, before next dot or end of line

Regex for finding words with no or only one word between them

Categories

Resources