How to write regular expression to match only numbers, letters and dashes? - regex

I need an expression that will only accept:
numbers
normal letters (no special characters)
-
Spaces are not allowed either.
Example:
The regular expression should match:
this-is-quite-alright
It should not match
this -is/not,soålright

You can use:
^[A-Za-z0-9-]*$
This matches strings, possibly empty, that is wholly composed of uppercase/lowercase letters (ASCII A-Z), digits (ASCII 0-9), and a dash.
This matches (as seen on rubular.com):
this-is-quite-alright
and-a-1-and-a-2-and-3-4-5
yep---------this-is-also-okay
And rejects:
this -is/not,soålright
hello world
Explanation:
^ and $ are beginning and end of string anchors respectively
If you're looking for matches within a string, then you don't need the anchors
[...] is a character class
a-z, A-Z, 0-9 in a character class define ranges
- as a last character in a class is a literal dash
* is zero-or-more repetition
regular-expressions.info
Anchors, Character Class, Repetition
Variation
The specification was not clear, but if - is only to be used to separate "words", i.e. no double dash, no trailing dash, no preceding dash, then the pattern is more complex (only slightly!)
_"alpha"_ separating dash
/ \ /
^[A-Za-z0-9]+(-[A-Za-z0-9]+)*$
\__________/| \__________/|\
"word" | "word" | zero-or-more
\_____________/
group together
This matches strings that is at least one "word", where words consists of one or more "alpha", where "alpha" consists of letters and numbers. More "words" can follow, and they're always separated by a dash.
This matches (as seen on rubular.com):
this-is-quite-alright
and-a-1-and-a-2-and-3-4-5
And rejects:
--no-way
no-way--
no--way

[A-z0-9-]+
But your question is confusing as it asks for letters and numbers and has an example containing a dash.

This is a community wiki, an attempt to compile links to related questions about this "URL/SEO slugging" topic. Community is invited to contribute.
Related questions
regex/php: how can I convert 2+ dashes to singles and remove all dashes at the beginning and end of a string?
-this--is---a-test-- becomes this-is-a-test
Regex for [a-zA-Z0-9-] with dashes allowed in between but not at the start or end
allow spam123-spam-eggs-eggs1 reject eggs1-, -spam123, spam--spam
Translate “Lorem 3 ipsum dolor sit amet” into SEO friendly “Lorem-3-ipsum-dolor-sit-amet” in Java?
Related tags
[slug]

Related

Regular Expression containing letters and spaces in specified fashion

I am working on a text processing Api in java. I need to match the strings which are:
At least 8 characters in length.
Should only contain uppercase letters, lowercase letters or spaces.
Spaces should not be present in between the letters. They can however be leading or trailing. The String can also contain only spaces which are at least 8.
Regular expression which I tried but failed:
^\s*[a-zA-Z]{8,}\s*$
Demo of my tries in here.
Any help will be welcomed.
You can use the below regex to achieve your result:
^(?=.{8,}) *[a-zA-Z]* *$
Explanation of the above regex:
^ - denotes start of the test String.
(?=) - Positive lookahead.
.{8,} - any character other than newline with length at least 8.
* - 0 or more spaces in order to match the leading spaces.(\s is avoided)
[a-zA-Z]* - 0 or more letters (uppercase or lowercase). (You can use [a-z]* along with i(case insensitive) flag. Although, there will be no effect on performance.)
* - 0 or more spaces in order to match the trailing spaces.(\s is avoided)
$ - denotes end of the test String.
Above regex demo.

Accept only numbers but ignore if two number groups have spaces in between them [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
I'm trying to develop a regex with the following rules:
it should accept solely numbers,
if the string contains any letters or any other special characters, the whole string should be rejected,
regarding spaces, there should only be one consecutive number group, which can be surrounded by spaces,
if there are more than one consecutive number group, with spaces in between the groups, that whole string should be rejected.
Example Cases:
accepted:
1234
[SPACE][SPACE]111[SPACE]
[SPACE]111[SPACE][SPACE]
declined:
1a234
aa1234aa
1234a
12#4
[SPACE]11[SPACE]111
[SPACE]11[SPACE]111#
So far, I've come up with this ([0-9]+[^\s]*) which can be seen here.
What modifications do I have to do to achieve the scenario I want above?
Use this:
^\s*\d+\s*$
All we need to do is accept one or more digits bounded by zero or more spaces on either side.
EDIT:
Just add a capturing group around the digits to use them later:
^\s*(\d+)\s*$
Demo
The pattern you tried ([0-9]+[^\s]*) matches 1+ digits and 0+ times a non whitespace character using a negated character class [^\s]* matching any character except a whitespace char (So it would match aa)
It can match multiple times in the same string as there are no anchors asserting the start ^ and the end $ of the string.
If you want to match spaces, instead of matching \s which could also match newlines, you could match a single space and repeat that 0+ times on the left and on the right side.
^ *[0-9]+ *$
Regex demo
If you only need the digits, you could use a capturing group
^ *([0-9]+) *$
Regex demo
^\s*[0-9]+\s*$
notice that I've used [0-9] instead of \d
[0-9] will accept only Arabic number (Western Arabic Number)
\d may accept all form of digit in unicode like Eastern Arabic Number, Thai,...etc like (١,٢,٣, ๑,๒,๓, ...etc) at least this is the case in XSD regex when its validate XML file.

Negating a complex regex containing three parts

I need a regex which is matched when the string doesn't have both lowercase and uppercase letters.
If the string has only lowercase letters -> should be matched
If the string has only uppercase letters -> should be matched
If the string has only digits or special characters -> should be matched
For example
abc, ABC, 123, abc123, ABC123&^ - should match
AbC, A12b, AB^%12c - should not match
Basically I need an inverse/negation of the following regex:
^(?=.*[a-z])(?=.*[A-Z]).+$
Does not sound like any lookarounds would be needed.
Either match only characters that are not a-z or only characters, that are not A-Z.
^(?:[^a-z]+|[^A-Z]+)$
See this demo at regex101 (used + for one or more)
You may use
^(?!.*[A-Z].*[a-z])(?!.*[a-z].*[A-Z])\S+$
Or
^(?=(?:[^a-z]+|[^A-Z]+)$).*$
See the regex demo #1 and regex demo #2
A lookaround solution like this can be used in more complex scenarios, when you need to apply more restrictions on the pattern. Else, consider a non-lookaround solution.
Details
^ - start of string
(?!.*[A-Z].*[a-z]) - no uppercase followed with a lowercase letter
(?!.*[a-z].*[A-Z]) - no lowercase letter followed with an uppercase one
(?=(?:[^a-z]+|[^A-Z]+)$) - a positive lookahead that requires 1 or more characters other than lowercase ASCII letters ([^a-z]+) to the end of the string, or 1 or more characters other than uppercase ASCII letters ([^A-Z]+) to the end of the string
.+ - 1+ chars other than line break chars
$ - end of string.
You can use this regex
^(([A-Z0-9?&%^](?![a-z]))+|([a-z0-9?&%^](?![A-Z]))+)$
You can test more cases here.
I've only added the characcter ?&%^ as possible character, but you could add which ever you like.
I would go with:
^(?:[^a-z]+?|[^A-Z]+?)$
It translates to "If the entire string is composed of non-lowercase letters or non-uppercase letters then match the string."
Lazy quantifiers +? are used so that the end-string $ anchor is obeyed when the multiline flag is enabled. If you're only validating a single-line string the you can simply use + without the question mark.
If you have a whitelist of specific allowed special chars then change [^A-Z] into [A-Z0-9()_+=-] and list the allowed special chars.
https://regex101.com/r/Wg6tLn/1

REGEX to find the first one or two capitalized words in a string

I am looking for a REGEX to find the first one or two capitalized words in a string. If the first two words is capitalized I want the first two words. A hyphen should be considered part of a word.
for Madonna has a new album I'm looking for madonna
for Paul Young has no new album I'm looking for Paul Young
for Emmerson Lake-palmer is not here I'm looking for Emmerson Lake-palmer
I have been using ^[A-Z]+.*?\b( [A-Z]+.*?\b){0,1} which does great on the first two, but for the 3rd example I get Emmerson Lake, instead of Emmerson Lake-palmer.
What REGEX can I use to find the first one or two capitalized words in the above examples?
You may use
^[A-Z][-a-zA-Z]*(?:\s+[A-Z][-a-zA-Z]*)?
See the regex demo
Basically, use a character class [-a-zA-Z]* instead of a dot matching pattern to only match letters and a hyphen.
Details
^ - start of string
[A-Z] - an uppercase ASCII letter
[-a-zA-Z]* - zero or more ASCII letters / hyphens
(?:\s+[A-Z][-a-zA-Z]*)? - an optional (1 or 0 due to ? quantifier) sequence of:
\s+ - 1+ whitespace
[A-Z] - an uppercase ASCII letter
[-a-zA-Z]* - zero or more ASCII letters / hyphens
A Unicode aware equivalent (for the regex flavors supporting Unicode property classes):
^\p{Lu}[-\p{L}]*(?:\s+\p{Lu}[-\p{L}]*)?
where \p{L} matches any letter and \p{Lu} matches any uppercase letter.
This is probably simpler:
^([A-Z][-A-Za-z]+)(\s[A-Z][-A-Za-z]+)?
Replace + with * if you expect single-letter words.
If u need a Full name only (a two words with the first capitalize letters), this is a simple example:
^([A-Z][a-z]*)(\s)([A-Z][a-z]+)$
Try it. Enjoy!

regex to match entire words containing only certain characters

I want to match entire words (or strings really) that containing only defined characters.
For example if the letters are d, o, g:
dog = match
god = match
ogd = match
dogs = no match (because the string also has an "s" which is not defined)
gods = no match
doog = match
gd = match
In this sentence:
dog god ogd, dogs o
...I would expect to match on dog, god, and o (not ogd, because of the comma or dogs due to the s)
This should work for you
\b[dog]+\b(?![,])
Explanation
r"""
\b # Assert position at a word boundary
[dog] # Match a single character present in the list “dog”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\b # Assert position at a word boundary
(?! # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
[,] # Match the character “,”
)
"""
The following regex represents one or more occurrences of the three characters you're looking for:
[dog]+
Explanation:
The square brackets mean: "any of the enclosed characters".
The plus sign means: "one or more occurrences of the previous expression"
This would be the exact same thing:
[ogd]+
Which regex flavor/tool are you using? (e.g. JavaScript, .NET, Notepad++, etc.) If it's one that supports lookahead and lookbehind, you can do this:
(?<!\S)[dog]+(?!\S)
This way, you'll only get matches that are either at the beginning of the string or preceded by whitespace, or at the end of the string or followed by whitespace. If you can't use lookbehind (for example, if you're using JavaScript) you can spell out the leading condition:
(?:^|\s)([dog]+)(?!\S)
In this case you would retrieve the matched word from group #1. But don't take the next step and try to replace the lookahead with (?:$|\s). If you did that, the first hit ("dog") would consume the trailing space, and the regex wouldn't be able to use it to match the next word ("god").
Depending on the language, this should do what you need it to do. It will only match what you said above;
this regex:
[dog]+(?![\w,])
in a string of ..
dog god ogd, dogs o
will only match..
dog, god, and o
Example in javascript
Example in php
Anything between two [](brackets) is a character class.. it will match any character between the brackets. You can also use ranges.. [0-9], [a-z], etc, but it will only match 1 character. The + and * are quantifiers.. the + searches for 1 or more characters, while the * searches for zero or more characters. You can specify an explicit character range with curly brackets({}), putting a digit or multiple digits in-between: {2} will match only 2 characters, while {1,3} will match 1 or 3.
Anything between () parenthesis can be used for callbacks, say you want to return or use the values returned as replacements in the string. The ?! is a negative lookahead, it won't match the character class after it, in order to ensure that strings with the characters are not matched when the characters are present.