Regex - Match unlimited no. of words of fixed length - regex

I'm fairly new to regex. I'm looking for an expression which will return strings in which the first character is of length 1, followed by an unlimited number of words of length 3 or more.
There should be a space between each word+.
So far I have:
([A-Za-z]{1,1} [A-Za-z+]{3,100})
As it stands this only returns phrases such as:
'I will' and 'A bird'
But I would like it to return phrases like:
'I will always try' and 'A bird flew into the cage'
Any help would be appreciated. I'm using an application called 'Oracle EDQ'.

You need to apply the limiting quantifier {3,} to the [A-Za-z] group and the * (zero or more repetitions) to the outer group matching a space + the 3+-letter words:
^[A-Za-z]( [A-Za-z]{3,})*$
See regex demo. Note the use of anchors ^ and $ that is very important when you need to match characters at a specific place (here, at the start and end of a word).
Regex matches:
^ - checks the regex engine is at the beginning of a string
[A-Za-z] - Exactly 1 letter a to z and A to Z
( [A-Za-z]{3,})* - zero or more sequences of...
- a space
[A-Za-z]{3,} - 3 or more ASCII letters
$ - end of string.

You can use this regex:
^[A-Za-z](?: [A-Za-z]{3,})+$
RegEx Demo

Related

Regex to identify for values other than alphanumeric values which can have hyphen or dot in between them but not at beginning or at end

I am new to the regular expressions. I have seen other quite close posts with a similar question but as you are aware in RegEx even dot matters a lot so here I am posting this question to seek help in this particular scenario.
My SQL column value can have a-z, A-Z, and 0-9
It can have a dot(.) and hyphen(-) in between. These 2 things cannot be at the beginning or at the end.
It cannot have space or tabs or any blanks anywhere in the column value.
It cannot start or end with any special characters; not even dots or hyphens.
I wrote this query which covers the 1st, 2nd, and 3rd points but fails in the 4th case.
select * from test_db.xtmp_testtable_invalidchars042321_rg where (sl_id REGEXP '[^[:alnum:]].+$')
**Table column input values**
RaghavGupta
.RaghavGupta
#Raghav.Gupta
"Raghav Gupta"
Raghav Gupta
Raghav#Gupta
Raghav$Gupta
Raghav%Gupta
Raghav*Gupta
Raghav.Gupta
RaghavGupta
RaghavGupta$
RaghavGupta.
RaghavGupta[]
**Query Result**
RaghavGupta
.RaghavGupta
#Raghav.Gupta
"Raghav Gupta"
Raghav Gupta
Raghav#Gupta
Raghav$Gupta
Raghav%Gupta
Raghav*Gupta
Raghav.Gupta
"RaghavGupta "
RaghavGupta[]
You can use NOT with the matching regex:
select * from test_db.xtmp_testtable_invalidchars042321_rg where (sl_id NOT REGEXP '^[[:alnum:]]+([.-][[:alnum:]]+)*$')
The pattern matches
^ - start of string
[[:alnum:]]+ - one or more alphanumeric chars ([:alnum:] is a POSIX character class that matches letters and/or digits)
([.-][[:alnum:]]+)* - (a capturing group that matches) zero or more repetitions of
[.-] - a . or -
[[:alnum:]]+ - one or more alphanumeric chars
$ - end of string.

Regex match for multiple characters

I want to write a regex pattern to match a string starting with "Z" and not containing the next 2 characters as "IU" followed by any other characters.
I am using this pattern but it is not working Z[^(IU)]+.*$
ZISADR - should match
ZIUSADR - should not match
ZDDDDR - should match
Try this regex:
^Z(?:I[^U]|[^I]).*$
Click for Demo
Explanation:
^ - asserts the start of the line
Z - matches Z
I[^U] - matches I followed by any character that is not a U
| - OR
[^I] - matches any character that is not a I
.* - matches 0+ occurrences of any character that is not a new line
$ - asserts the end of the line
When you want to negate certain characters in a string, you can use character class but when you want to negate more than one character in a particular sequence, you need to use negative look ahead and write your regex like this,
^Z(?!IU).*$
Demo
Also note, your first word ZISADR will match as Z is not followed by IU
Your regex, Z[^(IU)]+.*$ will match the starting with Z and [^(IU)]+ character class will match any character other than ( I U and ) one or more times further followed by .* means it will match any characters zero or more times which is not the behavior you wanted.
Edit: To provide a solution without look ahead
A non-lookahead based solution would be to use this regex,
^Z(?:I[^U]|[^I]U|[^I][^U]).*$
This regex has three main alternations which incorporate all cases needed to cover.
I[^U] - Ensures if second character is I then third shouldn't be U
[^I]U - Ensures if third character is U then second shouldn't be I
[^I][^U] - Ensures that both second and third characters shouldn't be I and U altogether.
Demo non-look ahead based solution

Match numbers after first character

I'd like to use Regex to determine whether the characters after the first are all numbers.
For example:
A123 would be valid as after A there are only numbers
A12B would be invalid as, after the first character, there is another letter
I essentially want to ignore the first character
I have so far this:
(?<=A)\w*(?=)
but this makes A12B or A1B2C valid, I only want numbers after A.
You could match not a digit \D, followed by matching 1+ times a digit. If that is the whole string, you could use anchors asserting the start ^ and the $ end of the string.
^\D\d+$
That will match:
^ Start of the string
\D Match not a digit
\d+ Match 1+ digits making sure there are digits
$ End of the string
Regex demo
The best solution I can think of is:
^.\d*$
^ - Start of the line
. - Any character (except line terminators)
\d*
\d- a number
* - repeated any number of times (including 0 times. If you want it to be at least 1, change it to +).
$ - End of the line
let regex = /^.\d*$/;
let testStrings = ['A123', 'A12B'];
testStrings.forEach(str => {
console.log(`${str} is ${regex.test(str) ? 'valid' : 'invalid'}`);
});
Your attempt is very complicated, especially given how simple is your goal.
Succeeding at regexes is all about simplicity.
The first character can be anything, so just go with ..
The next ones are all digits, so you want \d.
You'll star it to specify restriction-less repetition, or use + if you want at least one.
Finally, you need to anchor your regex at the beginning and at the end, else it would match stuff like A123XXXXX or XXXXA123.
Note that most implementations of match will already anchor the pattern at the end, so you can omit the caret at the beginning.
Final regex:
^.\d*$
Maybe
(?<=.{1,1})([0-9]+)(?=\s)
(?<=.{1,1}) - has exactly one character before
([0-9]+) - at least one digit
(?=\s) - has a whitespace after
Add ^ at the beginning - to specify beginning of line
Replace (?=\s) with $ for end of line
^[a-zA-Z][0-9]{3}$
^ - "starting with" (Here it is starting with any letter). Read it as ^[a-zA-Z]
[a-z] - any small letters and A-Z any capital letters (you may change if required.)
[0-9] - any numbers
{3} - describes how many numbers you want to check. You have to read it as [0-9]{3}
$ - End of the statement. (Means, in this case it will end up with 3 numbers)
Here you can play around - https://regex101.com/r/mqUHvP/5

I need an unique regex that requires at least on letter and disallows + and any form of blank space

I broke it down to two, but I'm wondering if it's possible in one.
My two regex
/^[^\s+ ]+$/
/(.*[a-zA-Z].*)/
You can use
/^[^+\s]*[a-z][^+\s]*$/i
See the regex demo
The pattern matches:
^ - start of string
[^+\s]* - zero or more characters other than + and whitespace
[a-z] - a letter (case insensitive - see /i modifier)
[^+\s]* - zero or more characters other than + and whitespace
$ - end of string
This expressions only requires one letter, and there can be any number of characters other than a space and a plus on both sides of the letter.
Try this. I'm not sure what you mean by "unique", though:
/^[^+\s]*[A-Za-z][^+\s]*$/
Why not both?
^(?=.*[a-zA-Z])[^\s+]+$
Uses lookahead.
^(?=.*[a-zA-Z])[^\s+]+$
^ start of regex
(?=.*[a-zA-Z]) make sure there is at least a letter ahead
[^\s+]+ make every character is not a plus or any whitespace character
$ end of regex
Notice how I changed your [^\s+ ] into my [^\s+] because \s already included the space (U+0020).

regex to match entire words containing only certain characters

I want to match entire words (or strings really) that containing only defined characters.
For example if the letters are d, o, g:
dog = match
god = match
ogd = match
dogs = no match (because the string also has an "s" which is not defined)
gods = no match
doog = match
gd = match
In this sentence:
dog god ogd, dogs o
...I would expect to match on dog, god, and o (not ogd, because of the comma or dogs due to the s)
This should work for you
\b[dog]+\b(?![,])
Explanation
r"""
\b # Assert position at a word boundary
[dog] # Match a single character present in the list “dog”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\b # Assert position at a word boundary
(?! # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
[,] # Match the character “,”
)
"""
The following regex represents one or more occurrences of the three characters you're looking for:
[dog]+
Explanation:
The square brackets mean: "any of the enclosed characters".
The plus sign means: "one or more occurrences of the previous expression"
This would be the exact same thing:
[ogd]+
Which regex flavor/tool are you using? (e.g. JavaScript, .NET, Notepad++, etc.) If it's one that supports lookahead and lookbehind, you can do this:
(?<!\S)[dog]+(?!\S)
This way, you'll only get matches that are either at the beginning of the string or preceded by whitespace, or at the end of the string or followed by whitespace. If you can't use lookbehind (for example, if you're using JavaScript) you can spell out the leading condition:
(?:^|\s)([dog]+)(?!\S)
In this case you would retrieve the matched word from group #1. But don't take the next step and try to replace the lookahead with (?:$|\s). If you did that, the first hit ("dog") would consume the trailing space, and the regex wouldn't be able to use it to match the next word ("god").
Depending on the language, this should do what you need it to do. It will only match what you said above;
this regex:
[dog]+(?![\w,])
in a string of ..
dog god ogd, dogs o
will only match..
dog, god, and o
Example in javascript
Example in php
Anything between two [](brackets) is a character class.. it will match any character between the brackets. You can also use ranges.. [0-9], [a-z], etc, but it will only match 1 character. The + and * are quantifiers.. the + searches for 1 or more characters, while the * searches for zero or more characters. You can specify an explicit character range with curly brackets({}), putting a digit or multiple digits in-between: {2} will match only 2 characters, while {1,3} will match 1 or 3.
Anything between () parenthesis can be used for callbacks, say you want to return or use the values returned as replacements in the string. The ?! is a negative lookahead, it won't match the character class after it, in order to ensure that strings with the characters are not matched when the characters are present.