Currently, I am not expert in Regex, but I tried below thing I want to improve it better, can some one please help me?
Pattern can contain ASCII letters, spaces, commas, periods, ', . and - special characters, and there can be one digit at the end of string.
So, it's working well
/^[a-z ,.'-]+(\d{1})?$/i
But I want to put condition that at least 2 letters should be there, could you please tell me, how to achieve this and explain me bit as well, please?
Note that {1} is always redundant in any regex, please remove it to make the regex pattern more readable. (\d{1})? is equal to \d? and matches an optional digit.
Taking into account the string must start with a letter, you can use
/^(?:[a-z][ ,.'-]*){2,}\d?$/i
Details:
^ - start of string
(?: - start of a non-capturing group (it is used here as a container for a pattern sequence to quantify):
[a-z] - an ASCII letter
[ ,.'-]* - zero or more spaces, commas, dots, single quotation marks or hyphens
){2,} - end of group, repeat two or more ({2,}) times
\d? - an optional digit
$ - end of string
i - case insensitive matching is ON.
See the regex demo.
The thing to change in your regex is + after the list of allowed characters.
+ means one or many occurrences of the provided characters. If you want to have 2 or more you can use {2,}
So your regex should look something like
/^[a-z ,.'-]{2,}\d?$/i
Related
I'm trying to come up with a regex for domain names that can either be 2-30 characters long with alphanumeric characters separated by a single hyphen with no other special characters allowed .
something like this thisi67satest-mydomain
What I have at the moment is this : /^[a-z0-9-]{2,30}$/ but this doesn't cover all scenarios especially with respect to the single hyphen.
I've always tried to google my way through these regexes. the above example will allow more than one hyphen which I don't want. How can i make the single hyphen mandatory?
Try this:
^(?=.{2,30}$)[a-z0-9]+-[a-z0-9]+$
^ the start of the line/string.
(?=.{2,30}$) ensures that the string between 2-30 characters.
[a-z0-9]+ one or more small letter or digit.
- one literal -.
[a-z0-9]+ one or more small letter or digit.
$ end of the line/string.
See regex demo
I think following pattern will work for you. Let me know if it work.
(\w|-(?!-)){2,30}
I am using regex to clean some text files.
In some places, spaces are missing as in the second line below:
1.9 Beef Curry
1.10Banana Pie
1.11 Corn Gravy
I need an expression to find a zero-length match at the position between 0 and B, so that I can replace it (in Notepad++) with a space. Note that numerators can be one or two digits, and there can also be one (i.e. 1. Exotic Disches) or three levels (i.e. 2.5.1 Chicken).
Can someone please give the answer?
I would have thought one of the following should work, but Notepad++ calls it invalid. Would also appreciate it if someone can tell my why...
(?<=\.\d\d|\.\d)(?! )(?!\.)
(?<=\.\d{1,3)(?! )(?!\.)
Thanks in advance!
Maybe it is enough, just to look for the zero length spaces \B (non word boundaries) between word characters and check, if preceded by a digit and not followed by a digit. If so, replace with space.
\B(?<=\d)(?!\d)
See this demo at regex101
at any \B non word boundary
(?<=\d) looks behind for a digt
(?!\d) looks ahead for no digit
For further restricting the digit part to dot, followed by 1-3 digits, try something like \.\d{1,3}\B\K(?!\d) where \K resets beginning of the reported match. Or without \K and replace by $0
Just to mention: Also the underscore belongs to word characters. If your input contains underscores, e.g. something like 1_ and you don't want to add space here, change the lookahead to (?![\d_])
You may use one of
^\d[\d.]*+(?!\h)
^\d[\d.]*+(?! )
^(?>\d+(?:\.\d+)*\.?)(?!\h)
Replace with $& .
Settings and test:
Details
^\d[\d.]*+(?!\h) matches a digit and then 0 or more digits/dots and once they are all matched, a horizontal whitespace is checked for. If there is no whitespace, there is a match.
^\d[\d.]*+(?! ) is the same, just the check is performed for a regular space.
^(?>\d+(?:\.\d+)*\.?)(?!\h) is more specific, it matches
^ - start of line
(?>\d+(?:\.\d+)*\.?) - an atomic group preventing backtracking:
\d+ - 1+ digits
(?:\.\d+)* - 0 or more sequences of . and 1+ digits
\.? - an optional dot
(?!\h) - no horizontal whitespace allowed immediately on the right
My alternative attempt also working
Find what: ^(\d\.\d+) ?(?=\w)
Replace with: $1 a space after $1
I need a regular expression that matches strings with at least one letter A-Z and, optionally, any number and combination of .-¤ (dot, dash and "sun"(what's it called in English?)).
Matched strings would be
A
AB
A-.
¤A
but NOT
-.
¤
since they don't have any letters.
My first try was of course ^[A-Z¤-.]*$ but that matches strings without letters as well.
[A-Z]+ matches strings with at least one letter
[¤.-]* matches strings that might have ¤.- in them
I've tried to combine these two last in a number of ways but haven't managed to solve my problem.
Is there a way to combine these two last regexp when I can't expect any particular order between the letters and the characters ¤.- and at the same time exclude any other characters?
Maybe groups or non-capturing groups has something to do with it, but I don't yet fully understand those.
PS I'm implementing this with the DB2 function REGEXP_LIKE.
You may use
^[A-Z.¤-]*[A-Z][A-Z.¤-]*$
Details
^ - start of string
[A-Z.¤-]* - 0+ uppercase letters, ., ¤ or -
[A-Z] - an uppercase letter
[A-Z.¤-]* - 0+ uppercase letters, ., ¤ or -
$ - end of string.
See how this regex matches sample strings.
I am trying to write some regex that will match a string that contains 4 or more letters in it that are not necessarily in sequence.
The input string can have a mix of upper and lowercase letters, numbers, non-alpha chars etc, but I only want it to pass the regex test if it contains at least 4 upper or lowercase letters.
An example of what I would like to be a valid input can be seen below:
a124Gh0st
I have currently written this piece of regex:
(?(?=[a-zA-Z])([a-zA-Z])| )
Which returns 5 matches successfully but it will currently always pass as long as I have greater than 1 letter in the input string. if I add {4,} to the end of it then it works, but only in situations where there are 4 letters in a row.
I am using the following website to test what I have been doing: regex101
Any help on this would be greatly appreciated.
You may use
(?s)^([^a-zA-Z]*[A-Za-z]){4}.*
or
^([^a-zA-Z]*[A-Za-z]){4}[\s\S]*
See the regex demo.
Details:
^ - start of string
([^a-zA-Z]*[A-Za-z]){4} - exactly 4 sequences of:
[^a-zA-Z]* - 0+ chars other than ASCII letters
[A-Za-z] - an ASCII letter
[\S\s]* - any 0+ chars (same as .* if the DOTALL modifier is enabled).
Why don't you just match the zero or more characters between each letter? For example,
(?:[A-Za-z].*){4}
You'll recognize the [A-Za-z]. The . matches any character, so .* is a run of any number (including zero) of any character. The group of a letter followed by any number of any characters is repeated four times, so this pattern matches if and only if at least four letters appear in the string. (Note that the trailing .* of the fourth repeat of the pattern is mostly inconsequential, since it can match zero characters).
If you are using a regex language that supports reluctant quantifiers, then using them will make this pattern considerably more efficient. For example, in Java or Perl, one might prefer to use
(?:[A-Za-z].*?){4}
The .*? still matches any number of any character, but the matching algorithm will match as few characters as possible with each such run. This will reduce the amount of backtracking it needs to perform. For this particular pattern, it will reduce the needed backtracking to zero.
If you do not have reluctant quantifiers in your regex dialect, then you can achieve the same desirable effect a bit more verbosely:
(?:[A-Za-z][^A-Za-z]*?){4}
There, only non-letters are matched for the runs between letters.
Even with this, the pattern uses some regex features not present in all regex flavors -- non-capturing groups, enumerated quantifiers -- but these are present in your original regex. For a maximally-compatible form, you might write
[A-Za-z][^A-Za-z]*[A-Za-z][^A-Za-z]*[A-Za-z][^A-Za-z]*[A-Za-z]
I would like to have a regular expression that checks if string of up to 14 alpha-numeric chars. can include hyphen, not at the beginning or end.
This what I have so far:
var patt = new RegExp("^([a-zA-Z0-9]+(-[a-zA-Z0-9])*){1,14}$");
But it's not working - http://jsfiddle.net/u6cWs/1/
Any idea?
You need to use positive lookahead (count number of alpha-numeric chars with optional hyphen).
If only single hyphen is allowed:
^(?=([a-zA-Z0-9]-?){1,14}$)[a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)?$
Demo
If multiple hyphens are allowed:
^(?=([a-zA-Z0-9]-?){1,14}$)[a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)*$
Demo
Additional option:
^[a-zA-Z0-9](?:-?[a-zA-Z0-9]){0,13}$
Demo
Here is a simple solution that is faster because it does not use lookaheads:
^[A-Za-z0-9](?:[-A-Za-z0-9]{0,12}[A-Za-z0-9])?$
See demo.
How does it work?
Like your original pattern, this regex is anchored between ^ and $, enforcing our limit on the number of characters.
The first character has to be a letter or digit.
The rest of the string, included in a (?: non-capturing group, is made optional by the ? at the end. This rest of the string, if it is there (more than one character), must end with a letter or digit. In the middle, you can have between 0 and 12 letters, digits or hyphens.
Optionally
If you want your regex to be a little shorter, turn on the case-insensitive option, and remove either the lower-case chars or the upper-case ones, for instance:
^[a-z0-9](?:[-a-z0-9]{0,12}[a-z0-9])?$
Use two regexes for simplicity and readability.
First check that it matches this:
/^[A-Za-z0-9-]{1,14}$/
then check that it does NOT match this:
/^-|-$/