We are doing lose validation on zipcode of form CITY, ST, ZIP. These can span countries, so all of the following are valid:
PITTSBURGH, PA, 15020
HAMILTON,ONTARIO,L8E 4B3
All I want to validate is that we have three comma-separated words (whitespace is fine). All of these would be valid:
foo, bar, baz
foo,bar,baz123
However these would be invalid because they don't have exactly two commas and three words:
foo, bar
boo,bar,baz,bang
foo, bar,
foo,bar,baz,
What I've Tried Unsuccessfully
^[\w],[\w],[\w]$
^[a-zA-Z0-9_.-]*,[a-zA-Z0-9_.-]*,[a-zA-Z0-9_.-]*$ (Doesnt allow sapces)
Also just curious - do yall typically allow whitespaces in regex or prefer an application filters whitespace first and then applies the regex? We can do either.
The pattern ^[\w],[\w],[\w]$ that you tried, can be written as ^\w,\w,\w$ and matches 3 times a single word char with a comma in between.
The pattern ^[a-zA-Z0-9_.-]*,[a-zA-Z0-9_.-]*,[a-zA-Z0-9_.-]*$ matches 3 times repeating 0 or more times any of the listed chars/ranges in the character class with a comma in between.
As the quantifier * is 0 or more times, it could possibly also match ,,
If the word chars should be present at all 3 occasions, and there can not be spaces at the start and end:
^\w+(?:\s*,\s*\w+){2}$
^ Start of string
\w+ Match 1+ word chars
(?:\s*,\s*\w+){2} Repeat 2 times matching a comma between optional whitspace chars and 1+ word chars
$ End of string
Regex demo
Note that \s can also match a newline. If you want to match spaces only, and the string can also start and end with a space you could use the pattern from #anubhava
from the comments.
Try
^\w*\W?,\W?\w*\W?,\W?(\w| ){1,}
(I tested by your examples)
Related
I've following (short url) strings which can be in these 3 ways:
abc-xy-helloWorld
abc-xy-helloWorld-welcome
abc-xy-helloWorld-welcome-home
I need to filter 'helloWorld' string only and following (?<=abc-xy-).* works for case #1 but how can I make it work for all 3 cases such that it filters only 'helloWorld' regardless of input is 1 or 2 or 3.
The .* part matches any zero or more chars other than line break chars as many as possible.
In your strings, the helloWord is followed with a - char, and the helloWorld only consists of letters.
So, possible solutions here are
(?<=abc-xy-)[a-zA-Z]+
(?<=abc-xy-)\w+
(?<=abc-xy-)[^-]+
See the regex demo here.
The [a-zA-Z]+ variant will match one or more ASCII letters, \w+ will match one or more letters, digits or underscores, and [^-]+ will match one or more chars other than -.
Note I use [^-\n] in the regex testing site because the input is a single multiline text, while in real life situation, you have these strings as separate inputs.
I'm new to regex, and would appreciate some guidance/help.
Currently, I'm looking to write an expression, that derives a certain part of text from the 2nd line of the provided text.
Here is the text:
123 anywhere Avenue
Winnipeg, Manitoba R3E 0L7
Canada
Pharmacy Manager: person person
Pharmacy Licence Holder/Owner: 123456 Manitoba Ltd.
see correct formatting with code here
My goal is to derive the 'Manitoba' string from the second line, however I'd like to make it dynamic rather than writing an expression to always fetch Manitoba as a static. I used the below code to target the second line:
(.*)(?=(\n.*){3}$)
(It matches 3 lines up from the last line, thus targeting the desired line)
I noticed, that within the dataset, that the Province (Manitoba) is always in between two spaces.
Is there any addition I can make to the code, so that the expression only targets the second line, then matches the first string in-between spaces?
Perhaps using a lazy expression with a positive lookaround?
If I target all matches in between spaces, it would take both 'Manitoba' and 'R3E 0L7' which I dont want.
I want it to only match the first piece of text in between spaces on the second line.
Any help is much appreciated :-)
Thanks.
One option could be to match the first line, then capture the second word in the second lines in capturing group 1.
Then match the rest of the second line and assert what follows is 3 times a line.
^.*\r?\n\S+[^\S\r\n]+(\S+).*(?=(?:\r?\n.*){3}$)
In parts:
^ Start of string
.*\r?\n Match the whole lines and a newline
\S+ Match 1+ non whitespace char (the first "word")
[^\S\r\n]+ Match 1+ times a whitespace char except newlines
(\S+) Capture group 1 Match 1+ times a non whitespace char (the second "word')
.* Match the rest of the line
(?= Positive lookahead, assert what follows on the right is
(?:\r?\n.*){3}$ Match 3 times a newline followed by 0+ times any except a newline and assert the end of the string
) Close lookahead
Regex demo
You could also turn the lookahead in to a match instead
^.*\r?\n\S+[^\S\r\n]+(\S+).*(?:\r?\n.*){3}$
Regex demo
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
I'm trying to develop a regex with the following rules:
it should accept solely numbers,
if the string contains any letters or any other special characters, the whole string should be rejected,
regarding spaces, there should only be one consecutive number group, which can be surrounded by spaces,
if there are more than one consecutive number group, with spaces in between the groups, that whole string should be rejected.
Example Cases:
accepted:
1234
[SPACE][SPACE]111[SPACE]
[SPACE]111[SPACE][SPACE]
declined:
1a234
aa1234aa
1234a
12#4
[SPACE]11[SPACE]111
[SPACE]11[SPACE]111#
So far, I've come up with this ([0-9]+[^\s]*) which can be seen here.
What modifications do I have to do to achieve the scenario I want above?
Use this:
^\s*\d+\s*$
All we need to do is accept one or more digits bounded by zero or more spaces on either side.
EDIT:
Just add a capturing group around the digits to use them later:
^\s*(\d+)\s*$
Demo
The pattern you tried ([0-9]+[^\s]*) matches 1+ digits and 0+ times a non whitespace character using a negated character class [^\s]* matching any character except a whitespace char (So it would match aa)
It can match multiple times in the same string as there are no anchors asserting the start ^ and the end $ of the string.
If you want to match spaces, instead of matching \s which could also match newlines, you could match a single space and repeat that 0+ times on the left and on the right side.
^ *[0-9]+ *$
Regex demo
If you only need the digits, you could use a capturing group
^ *([0-9]+) *$
Regex demo
^\s*[0-9]+\s*$
notice that I've used [0-9] instead of \d
[0-9] will accept only Arabic number (Western Arabic Number)
\d may accept all form of digit in unicode like Eastern Arabic Number, Thai,...etc like (١,٢,٣, ๑,๒,๓, ...etc) at least this is the case in XSD regex when its validate XML file.
I wanted to create regex expression that only matches when any string has three or more character and if any + sign in the string then after and before + sign it must be minimum three characters required,
I have created one regex it fulfills me all requirement except one that before first + sign must be minimum three characters but it matches with less character
this is my current regex: (\+[a-z0-9]{3}|[a-z0-9]{0,3})$
ab+abx this string should not match but it matched in my regex
Example:
Valid Strings:
sss
sdfsgdf
4534534
dfs34543
sdafds+3232+sfdsafd
qwe+sdf
234+567
cvb+243
Invalid Strings:
a
aa
a+
aa+
+aa
+a
a+a
aa+aa
aaa+a
You can use this regex,
^[^+\n]{3,}(?:\+[^+\n]{3,})*$
Explanation:
^ - Start of string
[^+\n]{3,} - This ensures it matches any characters except + and newline, \n you can actually remove if the input you're trying to match doesn't contain any newlines and {3,} allows it to match at least three and more characters
(?:\+[^+\n]{3,})* - This part further allows matching of a + character then further separated by at least three or more characters and whole of it zero or more times to keep appearance of + character optional
$ - End of input
Demo
Edit: Updating solution where a space does not participate in counting the number of characters in either side of + where minimum number of character required were three
You can use this regex to ignore counting spaces within the text,
^(?:[^+\n ] *){3,}(?:\+ *(?:[^+\n ] *){3,})*$
Demo
Also, in case you're dealing with only alphanumeric text, you can use this simpler and easier to maintain regex,
^(?:[a-z0-9] *){3,}(?:\+ *(?:[a-z0-9] *){3,})*$
Demo
You could repeat 0+ times matching 3 or more times what is listed in the character class [a-z0-9] preceded by a plus sign:
^[a-z0-9]{3,}(?:\+[a-z0-9]{3,})*$
That will match:
^ Start of string
[a-z0-9]{3,} Match 3+ times what is listed in the character class
(?: Non capturing group
\+[a-z0-9]{3,} Match + sign followed by matching 3+ times what is listed in the character class
)* Close group and repeat 0+ times
$ End of string
I have trouble understanding why my regex query takes one extra character besides the symbols I have told regex to include into the query, so this is my regex:
([\-:, ]{1,})[^0-9]
This is my test text:
Test- Product-: 1 --- 3 hour ,--kayak:--rental
It always includes the first character of each starting word, like P on Product or h on hour, how can I prevent regex from including those first characters?
I am trying to get all dashes, double points, comma and spaces excluding numbers or any characters.
The [^0-9] part of your regex matches any char but a digit, so you should remove it from your pattern.
There is no need to wrap the character class with a capturing group, and {0,1} is equal to +, so the whole regex can be shortened to
[-:, ]+
Note that - in the initial and end positions inside a character class does not have to be escaped.