Regex match for multiple characters - regex

I want to write a regex pattern to match a string starting with "Z" and not containing the next 2 characters as "IU" followed by any other characters.
I am using this pattern but it is not working Z[^(IU)]+.*$
ZISADR - should match
ZIUSADR - should not match
ZDDDDR - should match

Try this regex:
^Z(?:I[^U]|[^I]).*$
Click for Demo
Explanation:
^ - asserts the start of the line
Z - matches Z
I[^U] - matches I followed by any character that is not a U
| - OR
[^I] - matches any character that is not a I
.* - matches 0+ occurrences of any character that is not a new line
$ - asserts the end of the line

When you want to negate certain characters in a string, you can use character class but when you want to negate more than one character in a particular sequence, you need to use negative look ahead and write your regex like this,
^Z(?!IU).*$
Demo
Also note, your first word ZISADR will match as Z is not followed by IU
Your regex, Z[^(IU)]+.*$ will match the starting with Z and [^(IU)]+ character class will match any character other than ( I U and ) one or more times further followed by .* means it will match any characters zero or more times which is not the behavior you wanted.
Edit: To provide a solution without look ahead
A non-lookahead based solution would be to use this regex,
^Z(?:I[^U]|[^I]U|[^I][^U]).*$
This regex has three main alternations which incorporate all cases needed to cover.
I[^U] - Ensures if second character is I then third shouldn't be U
[^I]U - Ensures if third character is U then second shouldn't be I
[^I][^U] - Ensures that both second and third characters shouldn't be I and U altogether.
Demo non-look ahead based solution

Related

RegEx: how to don't match a repetition

I have followings String:
test_abc123_firstrow
test_abc1564_secondrow
test_abc123_abc234_thirdrow
test_abc1663_fourthrow
test_abc193_abc123_fifthrow
I want to get the abc + following number of each row.
But just the first one if it has more than one.
My current pattern looks like this: ([aA][bB][cC]\w\d+[a-z]*)
But this doesn't involve the first one only.
If somebody could help how I can implement that, that would be great.
You can use
^.*?([aA][bB][cC]\d+[a-z]*)
Note the removed \w, it matches letters, digits and underscores, so it looks redundant in your pattern.
The ^.*? added at the start matches the
^ - start of string
.*? - any zero or more chars other than line break chars as few as possible
([aA][bB][cC]\d+[a-z]*) - Capturing group 1: a or A, b or B, c or C, then one or more digits and then zero or more lowercase ASCII letters.
Use the following regex:
^.*?([aA][bB][cC]\d+)
Use ^ to begin at the start of the input
.*? matches zero or more characters (except line breaks) as few times as possible (lazy approach)
The rest is then captured in the capturing group as expected.
Demo

Match until the first occurence of the last character of a string

I've tried googling this but all I can find is how to match until a known character occurs. In my case I don't know the character beforehand.
I know I can match the last character of a string with (.?)$, and I know that I can match until a character X occurs with (?:(?!X).)*, but how do I combine the two to match until the first occurence and not the matched occurence?
Examples:
character → char
test → t
no match → no match
This is a test → This is a t
I came. I saw. I conquered. → I came.
In pseudocode what I want is basically str.substring(0,str.indexOf(str.lastChar)).
You may use
^(?=(.)*$).*?\1
^(?=.*(.)$).*?\1
See the regex demo. If you need to match multiline strings, see how How do I match any character across multiple lines in a regular expression?.
Details
^ - start of string
(?=(.)*$) - a positive lookahead capturing each char other than line break chars up to the end of string (last one is saved in Group 1)
.*? - any 0 or more chars other than line break chars as few as possible
\1 - same char as in Group 1.

Match numbers after first character

I'd like to use Regex to determine whether the characters after the first are all numbers.
For example:
A123 would be valid as after A there are only numbers
A12B would be invalid as, after the first character, there is another letter
I essentially want to ignore the first character
I have so far this:
(?<=A)\w*(?=)
but this makes A12B or A1B2C valid, I only want numbers after A.
You could match not a digit \D, followed by matching 1+ times a digit. If that is the whole string, you could use anchors asserting the start ^ and the $ end of the string.
^\D\d+$
That will match:
^ Start of the string
\D Match not a digit
\d+ Match 1+ digits making sure there are digits
$ End of the string
Regex demo
The best solution I can think of is:
^.\d*$
^ - Start of the line
. - Any character (except line terminators)
\d*
\d- a number
* - repeated any number of times (including 0 times. If you want it to be at least 1, change it to +).
$ - End of the line
let regex = /^.\d*$/;
let testStrings = ['A123', 'A12B'];
testStrings.forEach(str => {
console.log(`${str} is ${regex.test(str) ? 'valid' : 'invalid'}`);
});
Your attempt is very complicated, especially given how simple is your goal.
Succeeding at regexes is all about simplicity.
The first character can be anything, so just go with ..
The next ones are all digits, so you want \d.
You'll star it to specify restriction-less repetition, or use + if you want at least one.
Finally, you need to anchor your regex at the beginning and at the end, else it would match stuff like A123XXXXX or XXXXA123.
Note that most implementations of match will already anchor the pattern at the end, so you can omit the caret at the beginning.
Final regex:
^.\d*$
Maybe
(?<=.{1,1})([0-9]+)(?=\s)
(?<=.{1,1}) - has exactly one character before
([0-9]+) - at least one digit
(?=\s) - has a whitespace after
Add ^ at the beginning - to specify beginning of line
Replace (?=\s) with $ for end of line
^[a-zA-Z][0-9]{3}$
^ - "starting with" (Here it is starting with any letter). Read it as ^[a-zA-Z]
[a-z] - any small letters and A-Z any capital letters (you may change if required.)
[0-9] - any numbers
{3} - describes how many numbers you want to check. You have to read it as [0-9]{3}
$ - End of the statement. (Means, in this case it will end up with 3 numbers)
Here you can play around - https://regex101.com/r/mqUHvP/5

How to get the first match in regexp?

I have three strings as list below:
Levofloxacin 500mg/100mL
Levofloxacin 500mg
Procaterol Hydrochloride …………… 25μg
The first line, I want to just get 'mg' without 'mL' in my result.
The second line, I want get 'mg'.
The third line, I want get 'ug'.
I have try regexp pattern like:
(?!(.*[ ]{1}[0-9]+))[a-zA-Zμ]+
However, the first line always returns 'mg' with 'mL'...
How could I just acquire 'mg' with regexp?
Any suggestions will be appreciated.
As mentioned in the comment section, try this regex:
^\D*[\d.]+\K[a-zμ]+
Click for Demo
Explanation:
^ - asserts the start of the string
\D* - matches 0+ occurrences of any character that is not a digit
[\d.]+ - matches 1+ occurrences of any character that is a digit
\K - removes what has been matched so far
[a-zμ]+ - this is what you want. This will contain the units like mg, ml appearing after the first number. If there are any other special characters like μ, you can add them too in this character list

I need an unique regex that requires at least on letter and disallows + and any form of blank space

I broke it down to two, but I'm wondering if it's possible in one.
My two regex
/^[^\s+ ]+$/
/(.*[a-zA-Z].*)/
You can use
/^[^+\s]*[a-z][^+\s]*$/i
See the regex demo
The pattern matches:
^ - start of string
[^+\s]* - zero or more characters other than + and whitespace
[a-z] - a letter (case insensitive - see /i modifier)
[^+\s]* - zero or more characters other than + and whitespace
$ - end of string
This expressions only requires one letter, and there can be any number of characters other than a space and a plus on both sides of the letter.
Try this. I'm not sure what you mean by "unique", though:
/^[^+\s]*[A-Za-z][^+\s]*$/
Why not both?
^(?=.*[a-zA-Z])[^\s+]+$
Uses lookahead.
^(?=.*[a-zA-Z])[^\s+]+$
^ start of regex
(?=.*[a-zA-Z]) make sure there is at least a letter ahead
[^\s+]+ make every character is not a plus or any whitespace character
$ end of regex
Notice how I changed your [^\s+ ] into my [^\s+] because \s already included the space (U+0020).