Regex not capturing group that contains period - regex

I'm working a regex to match anything starting with a letter in a string similar to G71P100Q110U0W0F.01. I've come up with ([A-Z].*?)(?=[A-Z]) which works fine until I reach F.01 where it stops matching. From what I've read, the .*? should match anything lazily but it's not. What do I need to add to include the period?
Edit:
Desired matches for the string G71P100Q110U0W0F.01 would be G71, P100, Q110, U0, W0, and F.01. I can iterate through the matches easily enough in VBA.

You can delete the lookahead: (?=[A-Z]). I.,e. your regex would be simplified to ([A-Z].*?)
This lookahead makes sure that there will be at least one capital character after the end of .*. However, you already match a capital character at the beginning of your regex: ([A-Z]...). So you need two capital characters, but you have only one.
Unfortunately, I don't understand the rules on what you want and don't want to match. It would be cool to have more examples both for matching and not matching strings.
Probably this regex would be good for you:
([A-Z].*?)\.[0-9]+
It makes sure that your text:
starts with a capital letter
ends with a dot, and then one or more numbers
Demo here.

What you are trying to do is:
[A-Z][^A-Z]*
Match an uppercase letter then anything but an uppercase letter.
Live demo
From what I've read, the .*? should match anything lazily...
and it's the exact thing that's happening. It stops right after it finds following character is an uppercase letter.

Try this:
[A-Z]\.?[0-9]+
Period must be escaped.

I assume you are looking for a regex pattern that matches a sequence of non-space character(s) starting with a letter:
\b[a-zA-Z]\S*

[A-Z][^A-Z\s]+
[A-Z] match a single letter
[^A-Z\s]+ match anything that's not whitespace or a letter
Run code sample for demo
var input = "G71P100Q110U0W0F.01"
console.log(input.match(/[A-Z][^A-Z\s]+/g))

Related

How to make a pattern that check that all the first letter should be capital?

I need a pattern on angular that checks only if the first letter of each word will be capital.
To Make something like this I am using this pattern
pattern ="^([A-Z][a-z]*((\\s[A-Za-z])?[a-z]*)*)$"
1-works only for the first letter
2- when I have for example 2 fails, I want to check the first letter of strings.
You can try to use this regex pattern:
^(\b[A-Z]\w*\s*)+$
Please try this regex pattern :
/([A-Z][\w-]*(\s+[A-Z][\w-]*)+)/
Based on https://stackoverflow.com/a/4113070/8090014
Your pattern works only for the first letter in the first word because the word has to start with an uppercase A-Z. But after that, the repeated group starts with \s[A-Za-z] which would also match a lowercase a-z.
Note that \s also matches a newline. I you don't want that, you could match either a space or tab using a character class [ \t]
You could use match starting with A-Z and in the repeated group also start with matching A-Z. If you want to match words, you could use matching a word character \w
^[A-Z]\w*(?:[\t ]+[A-Z]\w*)*$
Regex demo

How to create proper regular expression to find last character which I want to?

I need to create regex to find last underscore in string like 012344_2.0224.71_3 or 012354_5.00123.AR_3.335_8
I have wanted find last part with expression [^.]+$ and then find underscore at found element but I can not handle it.
I hope you can help me :)
Just use a negative character class [^_] that will match everything except an underscore (this helps to ensure no other underscores are found afterwards) and end of string $
Pattern would look as such:
(_)[^_]*$
The final underscore _ is in a capturing group, so you are wanting to return the submatch. You would replace the group 1 (your underscore).
See it live: Regex101
Notice the green highlighted portion on Regex101, this is your submatch and is what would be replaced.
The simplest solution I can imagine is using .*\K_, however not all regex flavours support \K.
If not, another idea would be to use _(?=[^_]*$)
You have a demo of the first and second option.
Explanation:
.*\K_: Fetches any character until an underscore. Since the * quantifier is greedy, It will match until the last underscore. Then \K discards the previous match and then we match the underscore.
_(?=[^_]*$): Fetch an underscore preceeded by non-underscore characters until the end of the line
If you want nothing but the "net" (i.e., nothing matched except the last underscore), use positive lookahead to check that no more underscores are in the string:
/_(?=[^_]*$)/gm
Demo
The pattern [^.]+$ matches not a dot 1+ times and then asserts the end of the string. The will give you the matches 71_3 and 335_8
What you want to match is an underscore when there are no more underscores following.
One way to do that is using a negative lookahead (?!.*_) if that is supported which asserts what is at the right does not match any character followed by an underscore
_(?!.*_)
Pattern demo

RegEx more than multiple characters before number

I really don't use RegEx that much. You could say I am RegEx n00b. I have been working on this issue for a half a day.
I am trying to write a pattern that looks backward from a number character. For example:
1. bob1 => bob
2. cat3 => cat
3. Mary34 => Mary
So far I have this (?![A-Z][a-z]{1,})([A-Za-z_])
It only matches for individual characters, I want all the characters before the number character. I tried to add the ^ and $ into my pattern and using an online simulator. I am unsure where to put the ^ and $.
NOTE: I am using RegEx for the .NET Framework
You may use a regex like
[\p{L}_]+(?=\d)
or
[\w-[\d]]+(?=\d)
See the regex demo
Pattern details
[\p{L}_]+ - any 1 or more letters (both lower- and uppercase) and/or _
OR
[\w-[\d]]+ - 1 or more word chars except digits (the -[] inside a character class is a character class subtraction construct)
(?=\d) - a positive lookahead that requires a digit to appear immediately to the right of the current location
If we break down your RegEx, we see:
(?![A-Z][a-z]{1,}) which says "look ahead to find a string that is NOT one uppercase letter followed one or more lowercase letters" and ([A-Za-z_]) which says "match one letter or underscore". This should end up matching any single lowercase letter.
If I understand what you want to achieve, then you want all of the letters before a number. I would write something like that as:
\b([a-zA-Z]+)[0-9]
This will start at a word boundary \b, match one or more letters, and require a digit right after the matched string.
(The syntax I used seems to match this document about .NET RegEx: https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expressions)
In light of Wiktor Stribizew's comment, here is a pure match RegEx:
\b[a-zA-Z_]+(?=[0-9])
This matches the pattern and then looks ahead for the digit. This is better than my first lookahead attempt. (Thank you Wiktor.)
http://www.rexegg.com/regex-lookarounds.html

check if there is a word repeated at least 2 or more times. (Regular Expression)

Using Regular Expression,
from any line of input that has at least one word repeated two or more times.
Here is how far i got.
/(\b\w+\b).*\1
but it is wrong because it only checks for single char, not one word.
input: i might be ill
output: < i might be i>ll
<> marks the matched part.
so, i try to do (\b\w+\b)(\b\w+\b)*\1
but it is not working totally.
Can someone give help?
Thanks.
this should work
(\b\w+\b).*\b\1\b
greedy algorithm will ensure longest match. If you want second instance to be a separate word you have to add the boundaries there as well. So it's the same as
\b(\w+)\b.*\b\1\b
Positive lookahead is not a must here:
/\b([A-Za-z]+)\b[\s\S]*\b\1\b/g
EXPLANATION
\b([A-Za-z]+)\b # match any word
[\s\S]* # match any character (newline included) zero or more times
\b\1\b # word repeated
REGEX 101 DEMO
To check for repeated words you can use positive lookahead like this.
Regex: (\b[A-Za-z]+\b)(?=.*\b\1\b)
Explanation:
(\b[A-Za-z]+\b) will capture any word.
(?=.*\b\1\b) will lookahead if the word captured by group is present or not. If yes then a match is found.
Note:- This will produce repeated results because the word which is matched once will again be matched when regex pointer captures it as a word.
You will have to use programming to strip off the repeated results.
Regex101 Demo

Regex to match first word in sentence

I am looking for a regex that matches first word in a sentence excluding punctuation and white space. For example: "This" in "This is a sentence." and "First" in "First, I would like to say \"Hello!\""
This doesn't work:
"""([A-Z].*?(?=^[A-Za-z]))""".r
(?:^|(?:[.!?]\s))(\w+)
Will match the first word in every sentence.
http://rubular.com/r/rJtPbvUEwx
This is an old thread but people might need this like I did.
None of the above works if your sentence starts with one or more spaces.
I did this to get the first (non empty) word in the sentence :
(?<=^[\s"']*)(\w+)
Explanation:
(?<=^[\s"']*) positive lookbehind in order to look for the start of the string, followed by zero or more spaces or punctuation characters (you can add more between the brackets), but do not include it in the match.
(\w+) the actual match of the word, which will be returned
The following words in the sentence are not matched as they do not satisfy the lookbehind.
You can use this regex: ^[^\s]+ or ^[^ ]+.
You can use this regex: ^\s*([a-zA-Z0-9]+).
The first word can be found at a captured group.
[a-z]+
This should be enough as it will get the first a-z characters (assuming case-insensitive).
In case it doesn't work, you could try [a-z]+\b, or even ^[a-z]\b, but the last one assumes that the string starts with the word.