Matching certain number of specific letter in a word - regex

I am trying to match same ('reference') letter only in a word. For example:
Makaraka
Wasagara
degenerescence
desilicification
odontonosology
There are 4 'a' in the first word, 6 'o' in the last one. How can I match all of then using RE? I tried using backreference, but I couldn't manage, the last "sample" letter was never matched. Is there a way to specify the number of occurrences for a capturing group? Thanks.

You can use this regex:
^.*?(\w)(?=(?:.*?\1){3}).*$
RegEx Demo
Explanation: This regex matches any word character in the input and captures it for back reference \1 later. Then the lookahead part (?=(?:.*?\1){3}) ensures that there are at least 3 more occurrences of the captures word character.

How about:
(?:.*a){4,}
Just change the a for the letter you're searching.

Related

How to make a pattern that check that all the first letter should be capital?

I need a pattern on angular that checks only if the first letter of each word will be capital.
To Make something like this I am using this pattern
pattern ="^([A-Z][a-z]*((\\s[A-Za-z])?[a-z]*)*)$"
1-works only for the first letter
2- when I have for example 2 fails, I want to check the first letter of strings.
You can try to use this regex pattern:
^(\b[A-Z]\w*\s*)+$
Please try this regex pattern :
/([A-Z][\w-]*(\s+[A-Z][\w-]*)+)/
Based on https://stackoverflow.com/a/4113070/8090014
Your pattern works only for the first letter in the first word because the word has to start with an uppercase A-Z. But after that, the repeated group starts with \s[A-Za-z] which would also match a lowercase a-z.
Note that \s also matches a newline. I you don't want that, you could match either a space or tab using a character class [ \t]
You could use match starting with A-Z and in the repeated group also start with matching A-Z. If you want to match words, you could use matching a word character \w
^[A-Z]\w*(?:[\t ]+[A-Z]\w*)*$
Regex demo

check if there is a word repeated at least 2 or more times. (Regular Expression)

Using Regular Expression,
from any line of input that has at least one word repeated two or more times.
Here is how far i got.
/(\b\w+\b).*\1
but it is wrong because it only checks for single char, not one word.
input: i might be ill
output: < i might be i>ll
<> marks the matched part.
so, i try to do (\b\w+\b)(\b\w+\b)*\1
but it is not working totally.
Can someone give help?
Thanks.
this should work
(\b\w+\b).*\b\1\b
greedy algorithm will ensure longest match. If you want second instance to be a separate word you have to add the boundaries there as well. So it's the same as
\b(\w+)\b.*\b\1\b
Positive lookahead is not a must here:
/\b([A-Za-z]+)\b[\s\S]*\b\1\b/g
EXPLANATION
\b([A-Za-z]+)\b # match any word
[\s\S]* # match any character (newline included) zero or more times
\b\1\b # word repeated
REGEX 101 DEMO
To check for repeated words you can use positive lookahead like this.
Regex: (\b[A-Za-z]+\b)(?=.*\b\1\b)
Explanation:
(\b[A-Za-z]+\b) will capture any word.
(?=.*\b\1\b) will lookahead if the word captured by group is present or not. If yes then a match is found.
Note:- This will produce repeated results because the word which is matched once will again be matched when regex pointer captures it as a word.
You will have to use programming to strip off the repeated results.
Regex101 Demo

Regex - Matching Strings with a Single Character

I'm fairly new to regex. I'm looking for an expression which will return results which meet the following criteria:
The First word must be 3 letters or more
The last word must be 3 characters or more
If any word or words in-between the first and last word contains ONLY 1 letter, then return that phrase
Every other word in-between the first and last character that (apart from the single letter words) must be 3 letters or more
I would like it to return phrases like:
'Therefore a hurricane shall arrive' and 'However I know I like Michael Smith'
There should be a space between each word.
So far I have:
^([A-Za-z]{3,})*$( [A-Za-z])*$( [A-Za-z]{3,})*$
Any help would be appreciated. Is it something to do with the spacing? I'm using an application called 'Oracle EDQ'.
In a normal regex world you'd use a \b, a word boundary.
^[a-zA-Z]{3,}(\s+|\b([a-zA-Z]|[a-zA-Z]{3,})\b)*\s+[a-zA-Z]{3,}$
^^ ^^
See demo
And perhaps, non-capturing groups (as anubhava shows).
From what I see, there are no word boundaries in Oracle EDQ regex syntax (as well as non-capturing groups). You should rely on the \s pattern, matching whitespace.
So, make it obligatory, either with
^[a-zA-Z]{3,}(\s+|\s([a-zA-Z]|[a-zA-Z]{3,}))*\s+[a-zA-Z]{3,}$
^^
OR
^[a-zA-Z]{3,}(\s+|([a-zA-Z]|[a-zA-Z]{3,})\s)*\s*[a-zA-Z]{3,}$
^^ ^
You can use this regex:
^[a-zA-Z]{3,}(?:\s+|(?:[a-zA-Z]|[a-zA-Z]{3,}))*\s+[a-zA-Z]{3,}$
RegEx Demo

Regular expression in Vim to match group capture

I want to find the words which contain the same string repeated twice.
(e.g. wookokss(ok/ok), ccsssscc(ss/ss)).
I think the expression is \(\w*\)\0.
Another try is to find the words which consist of the same string repeated twice. My answer is \<\(\w*\)\0\>. (word beginning + grouping(word) + group capture + word ending)
But they don't work. Could anybody help me?
To find a string repeated twice in a word, which is longer than two characters, you can use
/\(\w\{2,}\)\1
To match a whole word which contains beforementioned string, you can use
/\<\w\{-}\(\w\{2,}\)\1\w\{-}\>
Little bit of explanation
\1 - matches the same string that was matched by the first sub-expression in \( and \) (\0 matches the whole matched pattern)
\{n,} - matches at least n of the preceding atom, as many as possible
\{-} - matches 0 or more of the preceding atom, as few as possible
\w - the word character ([0-9A-Za-z_])
\< - the beginning of a word
\> - the end of a word
More in :help pattern
1.) words which contain the same string repeated twice. (e.g. wookokss(ok/ok),
To find words containing two or more repeated word characters try
\(\w\{2,}\)\1
\1 matches what's captured in first group.
2.) find the words which consist of the same string repeated twice...
To capture \w\+ one or more word characters followed by \1 what's captured in first group
\<\(\w\+\)\1\>
should be about it. Have a look at this tutorial.
For the first one use (.{2,})\1 example here: https://regex101.com/r/gK0mM2/2
That is assuming that you only look for duplicate strings that have more than 1 character.
and for the second one ^(.{2,})\1$ example here: https://regex101.com/r/lC2yT7/2
Edit: changed the second expression, it now also looks for strings with at least 2 characters

Regex to match first word in sentence

I am looking for a regex that matches first word in a sentence excluding punctuation and white space. For example: "This" in "This is a sentence." and "First" in "First, I would like to say \"Hello!\""
This doesn't work:
"""([A-Z].*?(?=^[A-Za-z]))""".r
(?:^|(?:[.!?]\s))(\w+)
Will match the first word in every sentence.
http://rubular.com/r/rJtPbvUEwx
This is an old thread but people might need this like I did.
None of the above works if your sentence starts with one or more spaces.
I did this to get the first (non empty) word in the sentence :
(?<=^[\s"']*)(\w+)
Explanation:
(?<=^[\s"']*) positive lookbehind in order to look for the start of the string, followed by zero or more spaces or punctuation characters (you can add more between the brackets), but do not include it in the match.
(\w+) the actual match of the word, which will be returned
The following words in the sentence are not matched as they do not satisfy the lookbehind.
You can use this regex: ^[^\s]+ or ^[^ ]+.
You can use this regex: ^\s*([a-zA-Z0-9]+).
The first word can be found at a captured group.
[a-z]+
This should be enough as it will get the first a-z characters (assuming case-insensitive).
In case it doesn't work, you could try [a-z]+\b, or even ^[a-z]\b, but the last one assumes that the string starts with the word.