Count number of words in a string

Count number of words in a string - regex

How can I match the number of words in a string to be > then 5 using regex?
Input1: stack over flow => the regex will not match anything
Input2: stack over flow stack over => the regex will match this string
I have tried counting the spaces with /\/s/ but that didn't really helped me, because I need to match only strings with no of words > 5
Also I don't want to use split by spaces.

I would rely on a whitespace/non-whitespace patterns and allow trailing/leading whitespace:
^\s*\S+(?:\s+\S+){4,}\s*$
See demo
Explanation:
^ - start of string
\s* - optional any number of whitespace symbols
\S+ - one or more non-whitespace symbols
(?:\s+\S+){4,} - 4 or more sequences of one or more whitespace symbols followed with one or more non-whitespace symbols
\s* - zero or more (optional) trailing whitespace symbols
$ - end of string

^ *\w+(?: +\w+){4,}$
You can use this regex.See demo.
https://regex101.com/r/cZ0sD2/15

To check if there are at least 5 words in string:
(?:\w+\W+){4}\b
(?:\w+\W+){4} 4 words separated by non word characters
\b followed by a word boundary -> requires a 5th word
See demo at regex101

Related

Regex for finding words containing more then 3 'a' characters

I need to write a regex that will find all words with 3 or more 'a' letters. Suppose that each word is on a new line.
Example of correct words:
Anagram
Assassination
Abaca
I end up with something like this:
^([^aA]*a[^aA]*a[^aA]*a)$
But it will not work correctly if there will be more than 3 'a' letters or if word starts with 'a'.

I would keep it simple and just use:
\b\w*[Aa]\w*[Aa]\w*[Aa]\w*\b
Demo
This regex pattern matches any word containing three lower/upper a/A characters in it, appearing anywhere in the word.

Here is what I tried:
^(?i)(?:[b-z]*a){3}[a-z]*$
See an online demo
^ - Start line anchor.
(?i) - Match rest case-insensitive.
(?:[b-z]*a){3} - A non-capture group where you would match 0+ characters ranging from b-z upto a literal "a". Repeated three times.
[a-z]* - Match any possible remainder.
$- End line anchor.

If you want to use the anchors, you can add matching .* at the end, and add \n to the negated character class to prevent crossing newlines.
^[^aA\n]*[aA][^aA\n]*[aA][^aA\n]*[aA].*$
Regex demo
Or a bit shorter
^(?:[^aA\n]*[aA]){3}.*$
Regex demo

What is the Regex pattern "words with numbers in them" but not a number itself

How does a regex look like for
Input:
Rood Li-Ion 12 G6
Match:
"Rood" "Li-Ion" "G6"
1.
I tried
\b[\w-]+\b /g
But that matches the "12" also!
2.I tried
/([0-9]+)?[a-zA-Zê]/
But that didn't match G6.
I want all words even if they have a number in them but I dont want only numbers to match. How is this possible. Whitespace also shall not be part of the match.
"Rood Li-Ion 12 G6" shall become 3 strings of "Rood","Li-Ion","G6"

You can use
(?<!\S)(?!\d+(?!\S))\w+(?:-\w+)*(?!\S)
See the regex demo. It matches strings between whitespaces or start/end of string, and only when this non-whitespace chunk is not a digit only chunk.
Also, it won't match a streak of hyphens as your original regex.
Details
(?<!\S) - a left whitespace boundary
(?!\d+(?!\S)) - no one or more digits immediately to the right capped with whitespace or end of string is allowed
\w+(?:-\w+)* - one or more word chars followed with zero or more repetitions of - and one or more word chars
(?!\S) - a right whitespace boundary

This should suit your needs:
\b[\w-]*[a-zA-Z][\w-]*\b

A regex for letters and space that cannot be a whitespace

I cannot figure out how to add two regex together, I have these requirements:
Letters and space ^[\p{L} ]+$
Cannot be whitespace ^[^\s]+$
I cannot figure out how to write one regex that will combine both? There is perhaps some other solution?

You may use
^(?! +$)[\p{L} ]+$
^(?!\s+$)[\p{L}\s]+$
^\s*\p{L}[\p{L}\s]*$
Details
^ - start of string
(?!\s+$) - no 1 or more whitespaces are allowed till the end of the string
[\p{L}\s]+ - 1+ letters or whitespaces
$ - end of string.
See the regex demo.
The ^\s*\p{L}[\p{L}\s]*$ is a regex that matches any 0+ whitespaces at the start of the string, then requires a letter that it consumes, and then any 0+ letters/whitespaces may follow.
See the regex demo.

how to match a list of fixed length words separated by space or comma?

The words' length could be 2 or 6-10 and could be separated by space or comma. The word only include alphabet, not case sensitive.
Here is the groups of words that should be matched:
RE,re,rereRE
Not matching groups:
RE,rere,rel
RE,RERE
Here is the pattern that I have tried
((([a-zA-Z]{2})|([a-zA-Z]{6,10}))(,|\s+)?)
But unfortunately this pattern can match string like this: RE,RERE
Look like the word boundary has not been set.

You could match chars a-z either 2 or 6 - 10 times using an alternation
Then repeat that pattern 0+ times preceded by a comma or a space [ ,].
^(?:[A-Za-z]{6,10}|[A-Za-z]{2})(?:[, ](?:[A-Za-z]{6,10}|[A-Za-z]{2}))*$
Explanation
^ Start of string
(?:[A-Za-z]{6,10}|[A-Za-z]{2}) Match chars a-z 6 -10 or 2 times
(?: Non capturing group
[, ](?:[A-Za-z]{6,10}|[A-Za-z]{2}) Match comma or space and repeat previous pattern
)* Close non capturing group and repeat 0+ times
$ End of string
Regex demo
If lookarounds are supported, you might also assert what is directly on the left and on the right is not a non whitespace character \S.
(?<!\S)(?:[A-Za-z]{6,10}|[A-Za-z]{2})(?:[ ,](?:[A-Za-z]{6,10}|[A-Za-z]{2}))*(?!\S)
Regex demo

([a-zA-Z]{2}(,|\s)|[a-zA-Z]{6,10}|(,|\s))

This one will get only the words who have 2 letter, or between 6 and 10
\b,?([a-zA-Z]{6,10}|[a-zA-Z]{2}),?\b

You can use this
^(?!.*\b[a-z]{4}\b)(?:(?:[a-z]{2}|[a-z]{6,10})(?:,|[ ]+)?)+$
Regex Demo

This regex will match your first case, but neither of your two other cases:
^((([a-zA-Z]{2})|([a-zA-Z]{6,10}))(,|[ ]+|$))+$
I'm making the assumption here that each line should be a single match.
Here it is in action.

match only letters after comma without numbers

im using regex to match certain text after selecting with xpath
for example Huntsville, Alabama 11111
i want only Alabama which always come after comma
and i use [^,]*$ to get text after comma
but i can't seem to find a way to exclude numbers or returns only the letters
another exmaple when i want to get the numbers after the comma i use [^[0-9],]*$
but when i tried to tweak it with anything else it only return numbers or nothing.

[?<=,\s*][a-zA-Z]+ You can try this.
Explanation:
?<= => lookbehind to match a string but not include in capture group
,\s* => match comma followed by 0 or more spaces
[a-zA-Z]+ => match letters only (one or more)
HTH

To match a letter word after the last comma, you may use
[a-zA-Z]+(?=[^,]*$)
See the regex demo.
Details
[a-zA-Z]+ - 1 or more ASCII letters
(?=[^,]*$) - followed with 0+ chars other than , up to the end of the string.
To match 1 or more words in the same context, use
[a-zA-Z]+(?:\s+[a-zA-Z]+)*(?=[^,]*$)
^^^^^^^^^^^^^^^^^
See this regex demo.
The (?:\s+[a-zA-Z]+)* part matches zero or more consequent occurrences of 1+ whitespaces and 1+ ASCII letters.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Count number of words in a string - regex

^ *\w+(?: +\w+){4,}$ You can use this regex.See demo. https://regex101.com/r/cZ0sD2/15

To check if there are at least 5 words in string: (?:\w+\W+){4}\b (?:\w+\W+){4} 4 words separated by non word characters \b followed by a word boundary -> requires a 5th word See demo at regex101

Related

Regex for finding words containing more then 3 'a' characters

What is the Regex pattern "words with numbers in them" but not a number itself

A regex for letters and space that cannot be a whitespace

how to match a list of fixed length words separated by space or comma?

match only letters after comma without numbers

Categories

Resources