excluding canadian postal codes during shiping calculation - regex

I'm currently designing a website on shopify, and now I have to create rules for shipping using parcelify.
We've manage to get our account to use the legacy version, which allows us to use regex to put restriction on where we can and can't ship. The only thing is I don't know anything about Regex, so I listened to a couple of tutorials online, and I've come up with a few options, but none of them do what I want to do:
allow shipping anywhere except for postal codes starting with:
g0c
g0e
g0g
g0j
g0t
g0w
g4r
g4t
g4w
g5j
g5l
g8p
j0m
So I've come up with this, I know it can probably be much simpler, but I'm just trying to get this rule to work, maybe I'm totally off and that's why I'm reaching out for help here.
/(^(?!g0C|G0E|G0G|G0J|g0t|g0w|g4r|g4t|g4w|g4w|g4x|g5j|g5l|g8p|j0m)) ?([a-zA-Z0-9]*.{3}$)/gim
From what I understand, if I use a negative lookahead that would be the key to exclude the every postal codes with the FSA (first three characters of a postal code) mentioned above.
When I try to put it in regex101, everything seems fine (unless I just don't get how to read the results), but when it comes to putting it into the shopify app (parcelify), acceptable postal codes are not able to place an order because I'm getting blocked at the shipping step...
Every Canadian postal code is built of 6 character if you don't count the space in the middle

If the string can also begin with a space, you can add that to the negative lookahead to rule out that as well.
The /i makes the pattern case insensitive.
Also allowing spaces at the end:
^(?! ?(?:g0C|G0E|G0G|G0J|g0t|g0w|g4r|g4t|g4w|g4x|g5j|g5l|g8p|j0m)) ?(?:[a-zA-Z0-9] *){6}$
The pattern matches:
^ Start of string
(?! Negative lookahead
?(?:g0C|G0E|G0G|G0J|g0t|g0w|g4r|g4t|g4w|g4w|g4x|g5j|g5l|g8p|j0m)) Match a space followed by any of the alternatives
? Match an optional space (Or * for multiple spaces)
(?:[a-zA-Z0-9] *){6} Repeat 6 times matching a char from the character class followed by optional spaces
$ End of string
Regex demo
A bit shortened version using character classes and accepting no spaces at the end:
^(?! ?(?:g0[Cw]|G0[EGJ]|g0t|g4[rtwx]|g5[jl]|g8p|j0m)) ?(?:[a-zA-Z0-9] *){5}[a-zA-Z0-9]$
Regex demo

Related

Using PCRE2 regex with repeating groups to find email addresses

I need to find all email addresses with an arbitrary number of alphanumeric words, separated through a period. To test the regex, I'm using the website https://regex101.com/.
The structure of a valid email addresses is word1.word2.wordN#word1.word2.wordN.word.
The regex /[a-zA-Z0-9.]+#[a-zA-Z0-9.]+.[a-zA-Z0-9]+/gm finds all email addresses included in the document string, but also includes invalid addresses like ........#....com, if present.
I tried to group the repeating parts by using round brackets and a Kleene star, but that causes the regex engine to collapse.
Invalid regex:
/([a-zA-Z0-9]+.?)*[a-zA-Z0-9]+#([a-zA-Z0-9]+.?)*[a-zA-Z0-9]+.[a-zA-Z0-9]+/gm
Although there are many posts concerning regex groups, I was unable to find an explanation, why the regex engine fails. It seems that the engine gets stuck, while trying to find a match.
How can I avoid this problem, and what is the correct solution?
I think the main issue that caused you troubles is:
. (outside of []) matches any character,you probably meant to specify \. instead (only matches literal dot character).
Also there is no need to make it optional with ?, because the non-dot part of your regex will just match with the alphanumerical characters anyway.
I also reduced the right part (x*x is the same as x+), added a case-insensitive flag and ended up with this:
/([a-z0-9]+\.)*[a-z0-9]+#([a-z0-9]+\.)+[a-z0-9]+/gmi

Regex taking too many characters

I need some help with building up my regex.
What I am trying to do is match a specific part of text with unpredictable parts in between the fixed words. An example is the sentence one gets when replying to an email:
On date at time person name has written:
The cursive parts are variable, might contains spaces or a new line might start from this point.
To get this, I built up my regex as such: On[\s\S]+?at[\s\S]+?person[\s\S]+?has written:
Basically, the [\s\S]+? is supposed to fill in any letter, number, space or break/new line as I am unable to predict what could be between the fixed words tha I am sure will always be there.
Now comes the hard part, when I would add the word "On" somewhere in the text above the sentence that I want to match, the regex now matches a much bigger text than I want. This is due to the use of [\s\S]+.
How am I able to make my regex match as less characters as possible? Using "?" before the "+" to make it lazy does not help.
Example is here with words "From - This - Point - Everything:". Cases are ignored.
Correct: https://regexr.com/3jdek.
Wrong because of added "From": https://regexr.com/3jdfc
The regex is to be used in VB.NET
A more real life, with html tags, can be found here. Here, I avoided using [\s\S]+? or (.+)?(\r)?(\n)?(.+?)
Correct: https://regexr.com/3jdd1
Wrong: https://regexr.com/3jdfu after adding certain parts of the regex in the text above. Although, in html, barely possible to occur as the user would never write the matching tag himself, I do want to make sure my regex is correctjust in case
These things are certain: I know with what the part of text starts, no matter where in respect to the entire text, I know with what the part of text ends, and there are specific fixed words that might make the regex more reliable, but they can be ommitted. Any text below the searched part is also allowed to be matched, but no text above may be matched at all
Another example where it goes wrong: https://regexr.com/3jdli. Basically, I have less to go with in this text, so the regex has less tokens to work with. Adding just the first < already makes the regex take too much.
From my own experience, most problems are avoided when making sure I do not use any [\s\S]+? before I did a (\r)?(\n)? first
[\s\S] matches all character because of union of two complementary sets, it is like . with special option /s (dot matches newlines). and regex are greedy by default so the largest match will be returned.
Following correct link, the token just after the shortest match must be geschreven, so another way to write without using lazy expansion, which is more flexible is to prepend the repeated chracter set by a negative lookahead inside loop,
so
<blockquote type="cite" [^>]+?>[^O]+?Op[^h]+?heeft(.+?(?=geschreven))geschreven:
becomes
<blockquote type="cite" [^>]+?>[^O]+?Op[^h]+?heeft((?:(?!geschreven).)+)geschreven:
(?: ) is for non capturing the group which just encapsulates the negative lookahead and the . (which can be replaced by [\s\S])
(?! ) inside is the negative lookahead which ensures current position before next character is not the beginning of end token.
Following comments it can be explicitly mentioned what should not appear in repeating sequence :
From(?:(?!this)[\s\S])+this(?:(?!point)[\s\S])+point(?:(?!everything)[\s\S])+everything:
or
From(?:(?!From|this)[\s\S])+this(?:(?!point)[\s\S])+point(?:(?!everything)[\s\S])+everything:
or
From(?:(?!From|this)[\s\S])+this(?:(?!this|point)[\s\S])+point(?:(?!everything)[\s\S])+everything:
to understand what the technic (?:(?!tokens)[\s\S])+ does.
in the first this can't appear between From and this
in the second From or this can't appear between From and this
in the third this or point can't appear between this and point
etc.

Custom email validation regex pattern not working properly

So I've got /.+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]\#[\w+-?]+(.{1})\w{2,}/ pattern I want to use for email validation on client-side, which doesn't work as expected.
I know that my pattern is simple and doesn't cover every standard possibility, but it's part of my regex training.
Local part of address should be valid only when it has at least one digit [0-9] or letter [a-zA-Z] and can be mixed with comma or plus sign or underscore (or all at once) and then # sign, then domain part, but no IP address literals, only domain names with at least one letter or digit, followed by one dot and at least two letters or two digits.
In test string form it doesn't validate a#b.com and does validate baz_bar.test+private#e-mail-testing-service..com, which is wrong - it should be vice versa - validate a#b.com and not validate baz_bar.test+private#e-mail-testing-service..com
What specific error I've got there and where?
I can't locate this, sorry..
You need to change your regex
From: .+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]\#[\w+-?]+(\.{1})\w{2,}
To: .+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]?\#[\w+-]+(\.{1})\w{2,}
Notice that I added a ? before the # sign and removed the ? from the first "group" after the # sign. Adding that ? will make your regex to know that hole "group" is not mandatory.
See it working here: https://regex101.com/r/iX5zB5/2
You're requiring the local part (before #) to be at least two characters with the .+ followed by the character class [^...]. It's looking for any character followed by another character not in the list of exclusions you specify. That explains why "a#b.com" doesn't match.
The second problem is partly caused by the character class range +-? which includes the . character. I think you wanted [-\w+?]+. (Do you really want question marks?) And then later I think you wanted to look for a literal . character but it really ends up matching the first character that didn't match the previous block.
Between the regex provided and the explanatory text I'm not sure what rules you intend to implement though. And since this is an exercise it's probably better to just give hints anyway.
You will also want to use the ^ and $ anchors to makes sure the entire string matches.

positive look ahead and replace

Recently I'm writing/testing regexps on https://regex101.com/.
My question is: Is it possible to do a positive look-ahead AND a replacement in the same "replacement"? Or just limited kind of replacement is possible.
Input is several lines with phone numbers. Let's say the correct phone number where the number of "numbers" are 11. No matter how the numbers are divided/group together with - / characters, no matter if starts with + 00 or it is omitted.
Some example lines:
+48301234567
+48/30/1234567
+48-30-12-345-67
+483011223344556677
0048301234567
+(48)30/1234567
Positive look-ahead able to check if from the beginning until the end of line there are only 11 digits, regardless how many other, above specified character separating them. This works perfectly.
Where the positive look-ahead check is fine, I would like to delete every character but numbers. The replacement works fine until I'm not involving look-ahead.
Checking the regexp itself working perfectly ("gm" modes):
^(?:\+|00)?(?:[\-\/\(\)]?\d){11}$
Checking the replace part works perfectly (replace to nothing):
[^\d\n]
Put this into look-ahead, after the deletion of non new-line and non-digit characters from the matching lines:
(?=^(?:\+|00)?(?:[\-\/\(\)]?\d){11}$)[^\d\n]
Even I put the ^ $ into look-ahead, seems the replacement working only from beginning of the lines until the very first digit.
I know in real life the replacement and the check should/would go separate ways, however I'm curious if I could mix look-ahead/look-behind with string operations like replace, delete, take the string apart and put together as I like.
UPDATE: This is what would do the trick, however I feel this one "ugly" a bit. Is there any prettier solution?
https://regex101.com/r/yT5dA4/2
Or the version which I asked originally, where only digits remains: regex101.com/r/yT5dA4/3
You cannot replace/delete text with regex. Regex is just a tool for matching certain strings and then taking certain action depending on the matching text, eg. perform a substitution, retrieve the second capture group.
However it is possible to perform certain decisions within a regex engine, by using conditionals. The common syntax for this, with a lookahead assertion, is (?(?=regex)then|else).
With conditionals you can change the behaviour depending on how the text matches the regex. For your example you could do something like:
^(\+)?(?(1)\(|\d)
If the phone number starts with a plus it must be followed by a bracket, else it should start with a digit. Although in your situation, this is not very useful.
If you want to read up more on conditionals in regex you can do so here.

Regex to "ignore" not "exclude"

I'm totally lost. I need a regular expression that
can detect any of the 4 starting urls like below
^(.*http://.*|.*http%3A%2F%2F.*|.*https://.*|.*https%3A%2F%2F.*)$
And ... .
should detect:
(any punctuation or space or backspace)(3 times the letter w in upper or lower case)(one dot)(anything)
And ... . which is important
Should Ignore, but NOT Exclude... . the following exact string (either it's present in the page or not)
http://www.w3.org
Which is complicated for me, because i still need to include it in the regex line
even if it's ignored, otherwise, it will match & be found in
(.*http://.*|.*http%3A%2F%2F.*|.*https://.*|.*https%3A%2F%2F.*)
And my aim is to find/match any url besides
http://www.w3.org
even if it's in the page, Or if it's not present.
so if there's only this in the page:
http://www.w3.org
& no other url.. then it shouldn't match.
Thanks Tyler but my regex knowledge is almost zero, i can only know what commands do when i right click on them to chose actions like in regulazy or regexr ((
So i updated my command according to the url i provided to you:
href%3D%22http%3A%2F%2Fwww%2Edommermuth%2D1%2Ecom
& it works:
https?(://|%3A%2F%2F)(?!www.w3.org)(.*)
But because of my lack of knowledge, i don't understand how to do that below
"What you could do is make the http part optional, or must match http or www or both. This type of regex came up in another question I answered recently - Multiple preg_replace RegEx for different URLs"
I tried to add this, but it doesn't work:
(www.)
All i'm missing now is detection of urls starting with www
(any punctuation or space or backspace)(3 times the letter w in upper or lower case)(one dot)(anything till it reaches a space or the end of a line)
OK so try this:
/\bhttps?(://|%3A%2F%2F)(?!www\.w3\.org)(.*)\b/g
Test here: http://regexr.com?38jp5
That test link uses javascript-style regex, but should work elsewhere.
The important part is the second half - a negative lookahead, that checks what follows is not the exact text www.w3.org
I compressed what you had: mine matches http then an optional s then either :// or %3A%2F%2F.
I wrapped the whole thing in word boundaries, you could change that to quotes or whatever you need. The global flag lets you match multiple items.
In regards to OP's questions:
D%22
could appear before http or https
this one is missing & should match:
href%3D%22http%3A%2F%2Fwww%2Edommermuth%2D1%2Ecom
If this matters, just remove the word boundary \b before and after the regex, so the http can match anywhere.
The regex command should detect: (any punctuation or space or backspace)(3 times the letter w in upper or lower case)(one dot)(anything)
This regex would fail to match a link like http://google.com - looking for www is really not a good way to check for a link on its own. What you could do is make the http part optional, or must match http or www or both. This type of regex came up in another question I answered recently - Multiple preg_replace RegEx for different URLs
Edit #2:
(any punctuation or space or backspace)(3 times the letter w in upper or lower case)(one dot)(anything till it reaches a space or the end of a line)
As I mention above, what you are describing will not match a url like http://google.com - but if that is what you want, use this:
(\W|^)[wW]{3}\.[^\s$]+
Instead of that, what I think you want is this, which is a combination of my first answer, and the link to a different post above.
((https?(://|%3A%2F%2F))(www\.)|(https?(://|%3A%2F%2F))|(www\.))(?!(www\.)?w3\.org)([^</\?\s]+)[^<\s]*
You'll want to use this regex with the Global and Insensitive flags