Word count regex that only allows alphanumeric and maximum length - regex

I've spent the whole morning on gSkinner trying to change this regex. It correctly allows only 15 words, but how do I further limit input to alphanumeric only, and no valid word to be more than 25 characters in length?
I understand [a-z0-9], but presumably the use of word boundaries seems to confuse me because whatever I do I'm breaking it.
^\W*(?:\w+\b\W*){1,15}$
It's for use in javascript/php.

try this regex: ^((\w{1,25}))((\W\w{1,25}){1,14}|)
the first word will not be preceded by a space (\w{1,25}), these thing check this. now I want a blank space folowed by a word (\W\w{1,25}), but i want this from 1 to 14 times so (\W\w{1,25}){1,14}. Ok but if the input have only 1 word the second part of the pattern will not work, so instead of a blank space folowed by a word i can have nothing so i added the |. ((\W\w{1,25}){1,14}|)
EDIT
the pattern had a glitch if you put - and these kind of character so I updated it to this: ^([^ ]{1,25})(([ ]{1,}([^ ]{1,25}|)){1,14}|)

Related

.net Regex to look ahead and eliminate strings in advance that dont contain certain characters

I am Using .Net Flavor of Regex.
Suppose i have a string 123456789AB
and i want to match AB (Could be any two Capital letters) only if the string part containing numbers(123456789) has 5 and 8 in it.
So what i came up with was
(?=5)(?=8)([A-Z]{2})
But this is not working.
After some trail error on RegexStorm
I got to
(?=(.*5))(?=(.*8))[A-Z]{2}
What i am expecting is it will start matching from the start of the string as look ahead does not consume any characters.
But the part "[A-Z]{2}" does not move ahead to match AB in the input string.
My question is why is that so?
i know replacing it with .*[A-Z]{2} will make it move ahead but then the string matched has entire string in it.
What is the solution in this case other than putting word part ([A-Z]{2}) in a separate group and then catching only that group.
Lookaheads check for the pattern match immediately to the right of the current position in the string. (?=(.*5))(?=(.*8)) matches a location that is immediately followed with any 0 or more chars other than line break chars as many as possible and then 5 and then - at the same position - another similar check if performed but requiring 8 after any zero or more chars, as many as possible.
You may use as many as lookbehinds as there are required substrings before the two letters:
(?s)(?<=5.*?)(?<=8.*?)[A-Z]{2}
See the regex demo
Details
(?s) - makes the . match newline characters, too
(?<=5.*?) - a location that is immediately preceded with 5 and then 0 or more chars as few as possible
(?<=8.*?) - a location that is immediately preceded with 8 and then 0 or more chars as few as possible
[A-Z]{2} - two ASCII uppercase letters.
An alternative would be to "unfold" what you expect to match using exclusionary character classes and alternation of match order. Not pretty, but pretty fast:
(?<=\b[^58]*?(?:5[^8]*8|8[^5]*5)[^A-Z]*?)[A-Z]{2}

Regex: remove any chars or numbers before a needle

I am about to build a regex pattern to extract a number from a string which is unknown and can be different every time..
Because it is always unknown how my string looks, here a some common examples:
12cm iamtext 311
iamtext 311 12 cm iamtext 311
iamtext 311 12cm
Summed up: What I am aiming for is the number before cm or cm (space). This pattern can show up with a undefined amount of numbers. So, it could also be something like 12414 cm. In this case I want to get the 12414.
But if there is something like iamtext311 cm I don't want to get anything back cause in this case the number belongs to the text. But if there is a space between the number and the text, I want to get the 311.
This is what I got so far:
.*?\d+.*?(\d+)
But this isn't working for chars.. and I don't know how to process at the moment.. Cause it is such a complex situation especially with all the different cases with and without a space...
Would appreciate any kind of help!
How about that with \b with optional space character?
\b\d+\s?cm\b
DEMO: https://regex101.com/r/fsp3FS/10
Split the problem.
The number is obtained with the obvious \d+.
You don't want it preceded by any character but spacing characters: (?<!\S).
Must be followed by an optional space then characters cm: (?=\s?cm).
Put it together: (?<!\S)\d+(?=\s?cm).
Demo.
In your pattern .*?\d+.*?(\d+) you don't account for the cm part.
What you might do instead is assert the start of the string or match 1+ times a whitespace character and use a capturing group for the digits.
To prevent cm to be part of a longer word, you could add a word boundary \b:
(?:^|\s+)(\d+) ?cm\b
regex101 demo
If you don't want to match newlines using \s+ you could use a character class to match a space and/or a tab [ \t]

Regex ignore first x characters and then match pattern

String = '11111111111110000000000000000000110000000000000011111111111111111111111111111111110011111111111110000011110000011111111111110000000000011111111111111111010001111111111111111111110011111111111111111111111111110111112111121111111111111111111000011000001011111111111101022111101111001111111111110000001000000111111111111111000000000000011111111111111100011111111001011111111100000000000000000000000000000000100111001000000000000000000011000000000000001111111000000000000000000000000000000000001111100000000000000000000011000000000000000000000010000000000333333333'
I want a pattern to take out 10 characters after the first 100 so i want to have 100 - 110 then I want to compare that one and see if that string with a length of 10 have 4 zeros in a row.
How can I do this with only Regex? I have been using substring before.
You could use this:
^.{100}(?=.{0,6}0000)(.{10})
Explanation:
^: matches the start of the string to avoid that the pattern is used anywhere in the input
.{100}: match 100 characters
(?= ): look ahead. This does not capture, but just verifies something that is still ahead.
.{0,6}: 0 to 6 characters
0000: literally 4 zeroes
(.{10}): 10 characters, this time they are captured and can be referenced back with \1 or $1 depending on the flavour of regex.
The above answer is perfect. But that matches all the characters including first 100.
In case of ignoring first 100, we can use
(?<=.{100})
To check the required pattern in last 10 characters after first 100 only, we can use
(?<=.{100})(?=.{0,6}0000)(.{10})
You can test it here
Update : I checked the link today. It's taking somewhere else.

Regular expression for match string within first five words of input sentence

I want to match specific strings from beginning to 5th word of article title.
Input string:
The 14 best US colleges in the West are dominated by California — here's who makes the cut.
regex:
/^.*(\bbest\b|\btop\b|\bhot\b).*$/
Currently matched whole article title but want to search till "colleges".
and also need ignore or not matched strings like laptop,hot-spot etc.
You can use this expression
^((?:\w+\s?){1,5}).*
Explanation:
^ assert position at start of the string
\w+ match any word character
\s? match any white space character
{1,5} Quantifier - Between 1 and 5 times, as many times as possible
.* matches any character (except newline)
This matches the first 5 words (and spaces).
^(\w+\s){0,4}\b(best|top|hot)(\s|$)
You want to match string within first five words of input sentence. Then if counted from the start the sentence, there must be 0-4 words before the word you want to match. So you need ^(\w+\s){0,4} before the specific words you want to match. See https://regex101.com/r/nS0dU6/4
regex101 comes to help again.
^(?=(?:\w+\s){0,4}?(?:best|top|hot)\b(?!-))(\w+(?:\s\w+){0,4})
(?=(?:\w+\s){0,4}?(?:best|top|hot)\b(?!-) checks that the keyword is within first 5 (note that (?!-) is added to cater for words such as hot-spot)
(\w+(?:\s\w+){0,4}) then matches the first maximum 5 words

regular expression to match six spaces followed by up to 31 alphanumerics

It's getting towards the end of the day and this is annoying me - one day I'll find the time to learn regex properly as I know it can save a lot of time when extracting info from text.
I need to match strings that match the following signature:
6 spaces followed by up 31 alphanumerics (or spaces) and then no more alphanumeric text on that line.
E.g.
' sampleheading ' - is fine
' sampleheading 10^21/1 ' - should not match
' sampleheading sample ' - should not match
I've got ^(\s{6}[\w\s]{1,31}) matching the first bit correctly I think but I can't seem to get it to only select lines that don't have any text following the initial match.
Any help appreciated!
Edit:
I've updated the text as a number of you noted my hastily entered original samples would actually all have tested fine.
Use $ to match end of line:
^(\s{6}[\w\s]{1,31})$
Or, if you may still have spaces afterwards that you want to ignore:
^(\s{6}[\w\s]{1,31})\s*$
You can use a $ to indicate the end of a line, using \s* to allow optional whitespace at the end.
^\s{6}[\w\s]{1,31}\s*$
Your samples don't match what you're saying you're wanting, however. They only start with four spaces, rather than six, and, in the last sample, "sampleheading sample"
is within the 31 character limit, so it matches, too. (The middle sample is within the length, too, but has non-word characters in it, so it doesn't match). Is that what you want?
add a $ to match the end of the line, e.g.
^(\s{6}[\w\s]{1,31})$
Aren't you simply saying 'match 6 spaces followed by 31 alphanumerics' ? There's not concept there of 'and no more alphanumerics'
I think what you have is good so far (!), but you need to follow it with (say) [^\w] - i.e. 'not an alphanumeric'.
Try this one out:
^\s{6}[\w\s]{1,31}\W.*$