I am looking for a regex that matches first word in a sentence excluding punctuation and white space. For example: "This" in "This is a sentence." and "First" in "First, I would like to say \"Hello!\""
This doesn't work:
"""([A-Z].*?(?=^[A-Za-z]))""".r
(?:^|(?:[.!?]\s))(\w+)
Will match the first word in every sentence.
http://rubular.com/r/rJtPbvUEwx
This is an old thread but people might need this like I did.
None of the above works if your sentence starts with one or more spaces.
I did this to get the first (non empty) word in the sentence :
(?<=^[\s"']*)(\w+)
Explanation:
(?<=^[\s"']*) positive lookbehind in order to look for the start of the string, followed by zero or more spaces or punctuation characters (you can add more between the brackets), but do not include it in the match.
(\w+) the actual match of the word, which will be returned
The following words in the sentence are not matched as they do not satisfy the lookbehind.
You can use this regex: ^[^\s]+ or ^[^ ]+.
You can use this regex: ^\s*([a-zA-Z0-9]+).
The first word can be found at a captured group.
[a-z]+
This should be enough as it will get the first a-z characters (assuming case-insensitive).
In case it doesn't work, you could try [a-z]+\b, or even ^[a-z]\b, but the last one assumes that the string starts with the word.
Related
I need a pattern on angular that checks only if the first letter of each word will be capital.
To Make something like this I am using this pattern
pattern ="^([A-Z][a-z]*((\\s[A-Za-z])?[a-z]*)*)$"
1-works only for the first letter
2- when I have for example 2 fails, I want to check the first letter of strings.
You can try to use this regex pattern:
^(\b[A-Z]\w*\s*)+$
Please try this regex pattern :
/([A-Z][\w-]*(\s+[A-Z][\w-]*)+)/
Based on https://stackoverflow.com/a/4113070/8090014
Your pattern works only for the first letter in the first word because the word has to start with an uppercase A-Z. But after that, the repeated group starts with \s[A-Za-z] which would also match a lowercase a-z.
Note that \s also matches a newline. I you don't want that, you could match either a space or tab using a character class [ \t]
You could use match starting with A-Z and in the repeated group also start with matching A-Z. If you want to match words, you could use matching a word character \w
^[A-Z]\w*(?:[\t ]+[A-Z]\w*)*$
Regex demo
I'm working a regex to match anything starting with a letter in a string similar to G71P100Q110U0W0F.01. I've come up with ([A-Z].*?)(?=[A-Z]) which works fine until I reach F.01 where it stops matching. From what I've read, the .*? should match anything lazily but it's not. What do I need to add to include the period?
Edit:
Desired matches for the string G71P100Q110U0W0F.01 would be G71, P100, Q110, U0, W0, and F.01. I can iterate through the matches easily enough in VBA.
You can delete the lookahead: (?=[A-Z]). I.,e. your regex would be simplified to ([A-Z].*?)
This lookahead makes sure that there will be at least one capital character after the end of .*. However, you already match a capital character at the beginning of your regex: ([A-Z]...). So you need two capital characters, but you have only one.
Unfortunately, I don't understand the rules on what you want and don't want to match. It would be cool to have more examples both for matching and not matching strings.
Probably this regex would be good for you:
([A-Z].*?)\.[0-9]+
It makes sure that your text:
starts with a capital letter
ends with a dot, and then one or more numbers
Demo here.
What you are trying to do is:
[A-Z][^A-Z]*
Match an uppercase letter then anything but an uppercase letter.
Live demo
From what I've read, the .*? should match anything lazily...
and it's the exact thing that's happening. It stops right after it finds following character is an uppercase letter.
Try this:
[A-Z]\.?[0-9]+
Period must be escaped.
I assume you are looking for a regex pattern that matches a sequence of non-space character(s) starting with a letter:
\b[a-zA-Z]\S*
[A-Z][^A-Z\s]+
[A-Z] match a single letter
[^A-Z\s]+ match anything that's not whitespace or a letter
Run code sample for demo
var input = "G71P100Q110U0W0F.01"
console.log(input.match(/[A-Z][^A-Z\s]+/g))
I'm new in regex, first time I use them.
Given a string, with multiple words, I need to extract the second word (word = any number of char between to spaces).
For example: "hi baby you're my love"
I need to extract "baby"
I think I could start from this: (\b\w*\b) that matches every single word, but I don't know how to make it skip the first match.
Thank's for suggestion guys,
I've modified a little your regex and I finally find what I need:
(?<=\s)(.*?)(?=\s)
This one (?<=.)(\b\w+\b) was also kinda good but fails if I have string like "hi ba-by you're my love" splitting "ba-by" into "ba" and "by".
You can do it even without \b.
Use \w+\s+(\w+) and read the word from capturing group 1.
The regex above:
First mathes a non-empty sequence of word characters (the first word).
Then it matches a non-empty sequence of white chars (spaces) between
word 1 and 2.
And finally, the capturing group captures just the second word.
Note that \s+(\w+) is wrong, because the source string can begin with a space
and in such case this regex would have catched the first word.
I'm fairly new to regex. I'm looking for an expression which will return results which meet the following criteria:
The First word must be 3 letters or more
The last word must be 3 characters or more
If any word or words in-between the first and last word contains ONLY 1 letter, then return that phrase
Every other word in-between the first and last character that (apart from the single letter words) must be 3 letters or more
I would like it to return phrases like:
'Therefore a hurricane shall arrive' and 'However I know I like Michael Smith'
There should be a space between each word.
So far I have:
^([A-Za-z]{3,})*$( [A-Za-z])*$( [A-Za-z]{3,})*$
Any help would be appreciated. Is it something to do with the spacing? I'm using an application called 'Oracle EDQ'.
In a normal regex world you'd use a \b, a word boundary.
^[a-zA-Z]{3,}(\s+|\b([a-zA-Z]|[a-zA-Z]{3,})\b)*\s+[a-zA-Z]{3,}$
^^ ^^
See demo
And perhaps, non-capturing groups (as anubhava shows).
From what I see, there are no word boundaries in Oracle EDQ regex syntax (as well as non-capturing groups). You should rely on the \s pattern, matching whitespace.
So, make it obligatory, either with
^[a-zA-Z]{3,}(\s+|\s([a-zA-Z]|[a-zA-Z]{3,}))*\s+[a-zA-Z]{3,}$
^^
OR
^[a-zA-Z]{3,}(\s+|([a-zA-Z]|[a-zA-Z]{3,})\s)*\s*[a-zA-Z]{3,}$
^^ ^
You can use this regex:
^[a-zA-Z]{3,}(?:\s+|(?:[a-zA-Z]|[a-zA-Z]{3,}))*\s+[a-zA-Z]{3,}$
RegEx Demo
I'm trying to match the last four characters (alphanumeric) of all words beginning with the sequence &c.
For instance, in the string below, I'd like to match the pieces in bold:
Colour one is &cFF2AC3 and colour two is &c22DE4A.
Can anybody help me with the correct regex expression? I've spent hours on this great resource to no avail.
it looks like hexadecimal numbers, so use this pattern
&c[0-9A-F]{2}\K([0-9A-F]{4})
DEMO
This:
/(?i)\s*&c(?:[a-z0-9]{2})([a-z0-9]{4})\b/
append a g to the end of it if you want it to find all matches in a given text
Try this
/(?:^| )&c\w*(\w{4})\b/
If you want to try it in the regex tester you linked to, make sure to use the g modifier to see all matches.
Explanation: (?:^| ) matches either a space or the start of the string, &c\w* matches the ampersand and the the first however many characters of the word, and then \w{4} captures the last 4 characters. \b on the end asserts a word break (a "non-word" character or the end of the string).