C# regular express for list ips 65.232.211.[001-175] - regex

I want to match IP against my IP list which stored in arraylist but it is in this format
65.232.211.[001-175]
eg. 68.232.211.133 must be match
68.232.211.199 not match
I want regualr express for this scenario but I dont know how it would be..
I tried but not getting correct ans..
Please help me..

You could use something like so: 68\\.232\\.211\\.0*([1-9][0-9]?|1[0-6][0-9]|17[0-5]). The last part should match the numerical range you are after (courtesy of Regex_For_Range).
Since the period character in regex is a special character (denoting any character), it needs to be escaped. This is done by adding an extra slash, like so: \.. Since you are using C# (it seems) you need to escape the slash as well since that is a special character in the C# language.
You could, alternatively (and even better than the above) use the following regex to split the IP in 2 and do what ever validation you need: ^([\d.]+?)\.(\d+)$. This regex would yield 2 groups, so taking 68.232.211.133 as an example, it would yield 68.232.211 and 133.
The above will allow you to match the initial part of the IP as a string and it will then allow you to take the last section of the IP, change it to a numerical value and perform range checks using mathematical operator.
In my opinion, the second approach should be favoured since it is (in my opinion) easier to maintain.

Related

Regex for Google Analytics Goals

I've searched all the other Regex on Google Analytics questions but I can't use the answers as this is pretty specific to my problem.
I want to set a goal but use Regex to flag it as a goal IF string includes
/client-thank-you/ AND anything EXCEPT hire
so in other words
/client-thank-you/hire is not correct
/client-thank-you/anything/else is correct
Each of the following regexes will match any string that contains /client-thank-you/ and does not contain hire, depending on what assumption(s) you make about where "hire" is in the string.
Solution
Where can "hire" be located in the string?
Anywhere:
((?!hire).)*?/client-thank-you/((?!hire).)*
Only following the "/client-thank-you/":
.*?/client-thank-you/((?!hire).)*
Only immediately following the "/client-thank-you/":
.*?/client-thank-you/(?!hire).*
Notes
Optimization:
Each of these regexes will match the entire string. If your tool lets you determine if a string contains a substring match (rather than naively attempting to match the entire string), then you could optimize the second and third regexes by removing the leading .*?. Likewise, the third regex could be further optimized by removing the trailing .* as well.
Positively require "anything":
Note that all of these regexes assume that a string that ends with "/client-thank-you/" (with nothing after it) is valid. If this assumption is incorrect (i.e. the string .*/client-thank-you/$ is not a match), then change the trailing * on every regex to +. This would also mean that you have to keep the last .* on the third regex as a .+ (i.e. don't optimize that away).
EDIT:
The above will not work since GA uses a very limited version of regex (that does not include lookaround). If there is no other GA tool (other than a single regex) that you can use that meets your needs, then you could use the following as a last-ditch effort:
([-._~!$&'()*+,;=:#/0-9A-Za-gi-z]|h[-._~!$&'()*+,;=:#/0-9A-Za-hj-z]|hi[-._~!$&'()*+,;=:#/0-9A-Za-qs-z]|hir[-._~!$&'()*+,;=:#/0-9A-Za-df-z]|.{1,3}$)
And in expanded form for illustration purposes only:
( | | | | )
[-._~!$&'()*+,;=:#/0-9A-Za-gi-z] h[-._~!$&'()*+,;=:#/0-9A-Za-hj-z] hi[-._~!$&'()*+,;=:#/0-9A-Za-qs-z] hir[-._~!$&'()*+,;=:#/0-9A-Za-df-z] .{1,3}$
This regex will match 1-4 characters that do not form "hire". It does so by matching the minimum number of characters necessary to verify that the match is neither "hire" nor can serve as a prefix of "hire". It takes into account end-of-line (e.g. "hir" is valid if there is nothing else after it). The characters that it matches are all valid characters that can occur in the path component of a URL as specified in RFC 3986.
You use this regex by substituting it for every ((?!hire).) in any of the solutions given above. For example:
.*?/client-thank-you/([-._~!$&'()*+,;=:#/0-9A-Za-gi-z]|h[-._~!$&'()*+,;=:#/0-9A-Za-hj-z]|hi[-._~!$&'()*+,;=:#/0-9A-Za-qs-z]|hir[-._~!$&'()*+,;=:#/0-9A-Za-df-z]|.{1,3}$).*
This matches any url that contains "/client-thank-you/" but not "/client-thank-you/hire".
Do be careful, though. Doubled "h"s will make this workaround fail (e.g. "hhire"). However, if "hire" will only ever follow a path delimiter (i.e. /hire/), then that shouldn't be a problem.
If you can't use a lookahead like Travis suggested, then I suggest setting the goal to fire on an event instead of a pageview.
If you're using Google Tag Manager, you'll have the ability to write a more advanced regex, or at least set a blocking rule for the event that prevents it from firing when 'hire' is in the page URL.

Custom email validation regex pattern not working properly

So I've got /.+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]\#[\w+-?]+(.{1})\w{2,}/ pattern I want to use for email validation on client-side, which doesn't work as expected.
I know that my pattern is simple and doesn't cover every standard possibility, but it's part of my regex training.
Local part of address should be valid only when it has at least one digit [0-9] or letter [a-zA-Z] and can be mixed with comma or plus sign or underscore (or all at once) and then # sign, then domain part, but no IP address literals, only domain names with at least one letter or digit, followed by one dot and at least two letters or two digits.
In test string form it doesn't validate a#b.com and does validate baz_bar.test+private#e-mail-testing-service..com, which is wrong - it should be vice versa - validate a#b.com and not validate baz_bar.test+private#e-mail-testing-service..com
What specific error I've got there and where?
I can't locate this, sorry..
You need to change your regex
From: .+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]\#[\w+-?]+(\.{1})\w{2,}
To: .+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]?\#[\w+-]+(\.{1})\w{2,}
Notice that I added a ? before the # sign and removed the ? from the first "group" after the # sign. Adding that ? will make your regex to know that hole "group" is not mandatory.
See it working here: https://regex101.com/r/iX5zB5/2
You're requiring the local part (before #) to be at least two characters with the .+ followed by the character class [^...]. It's looking for any character followed by another character not in the list of exclusions you specify. That explains why "a#b.com" doesn't match.
The second problem is partly caused by the character class range +-? which includes the . character. I think you wanted [-\w+?]+. (Do you really want question marks?) And then later I think you wanted to look for a literal . character but it really ends up matching the first character that didn't match the previous block.
Between the regex provided and the explanatory text I'm not sure what rules you intend to implement though. And since this is an exercise it's probably better to just give hints anyway.
You will also want to use the ^ and $ anchors to makes sure the entire string matches.

List of allowed characters from regular expression

Does someone know about some way how to extract allowed characters from regular expression and construct user friendly message?
For example, by providing regular expression
^[a-zA-Z0-9&\-\+_\.\s]{1,10}$
to get something like
a-z A-Z 0-9 & - + _ . with spaces
I am using java. I can imagine that it could be too complicated or even impossible to cover all types of regular expressions, but maybe you know about some library, tool or algorithm that could help.
Thanks
Yes. It can be done.
What you need is:
Turn your regexp body into a string.
Parse that string (with a regex for instance) that will output the desired list.
Apply possible regexp options (such as ignore case to the result).
This is tedious work if you're not VERY familiar with Regexp. I actually have code in production doing just that, but it's proprietary so I can't post it here and it's not in Java.
I guess you should first ask yourself whether there is no simpler solution for your problem. If for instance your regexp is a constant, you could associate it with a by-hand list of accepted characters.
If your input is a character-class like the one you provided, you could match it with the expression
([^\\]-[^\\]|\\.|[^^$[\]])
that will give you a list of elements like "a-z", "\+", "_" that you could then tidy up a little further, e.g., removing the "\", and then print it nicely formatted.
And you could extract the length information using
{([0-9]+)(,([0-9]+))?}
that accepts {1,10} as well as {10} with the "from" and "to" values being captured each in their own group.
That should get you started.

Multiple spaces, multiple commas and multiple hypens in alphanumeric regex

I am very new to regex and regular expressions, and I am stuck in a situation where I want to apply a regex on an JSF input field.
Where
alphanumeric
multiple spaces
multiple dot(.)
multiple hyphen (‐)
are allowed, and Minimum limit is 1 and Maximum limit is 5.
And for multiple values - they must be separated by comma (,)
So a Single value can be:
3kd-R
or
k3
or
-4
And multiple values (must be comma separated):
kdk30,3.K-4,ER--U,2,.I3,
By the help of stackoverflow, so far I am able to achieve only this:
(^[a-zA-Z0-9 ]{5}(,[a-zA-Z0-9 ]{5})*$)
Something like
^[-.a-zA-Z0-9 ]{1,5}(,[-.a-zA-Z0-9 ]{1,5})*$
Changes made
[-.a-zA-Z0-9 ] Added - and . to the character class so that those are matched as well.
{1,5} Quantifier, ensures that it is matched minimum 1 and maximum 5 characters
Regex demo
You've done pretty good. You need to add hyphen and dot to that first character class. Note: With the hyphen, since it delegates ranges within a character class, you need to position it where contextually it cannot be specifying a range--not to say put it where it seems like it would be an invalid range, e.g., 7-., but positionally cannot be a range, i.e., first or last. So your first character class would look something like this:
[a-zA-Z 0-9.-]{1,5} or [-a-zA-Z0-9 .]{1,5}
So, we've just defined what one segment looks like. That pattern can reoccur zero or more times. Of course, there are many ways to do that, but I would favor a regex subroutine because this allows code reuse. Now if the specs change or you're testing and realize you have to tweak that segment pattern, you only need to change it in one place.
Subroutines are not supported in BRE or ERE, but most widely-used modern regex engines support them (Perl, PCRE, Ruby, Delphi, R, PHP). They are very simple to use and understand. Basically, you just need to be able to refer to it (sound familiar? refer-back? back-reference?), so this means we need to capture the regex we wish to repeat. Then it's as simple as referring back to it, but instead of \1 which refers to the captured value (data), we want to refer to it as (?1), the capturing expression. In doing so, we've logically defined a subroutine:
([a-zA-Z 0-9.-]{1,5})(,(?1))*
So, the first group basically defines our subroutine and the second group consists of a comma followed by the same segment-definition expression we used for the first group, and that is optional ('*' is the zero-or-more quantifier).
If you operate on large quantities of data where efficiency is a consideration, don't capture when you don't have to. If your sole purpose for using parenthesis is to alternate (e.g., \b[bB](asset|eagle)\b hound) or to quantify, as in our second group, use the (?: ... ) notation, which signifies to the regex engine that this is a non-capturing group. Without going into great detail, there is a lot of overhead in maintaining the match locations--not that it's complex, per se, just potentially highly repetitive. Regex engines will match, store the information, then when the match fails, they "give up" the match and try again starting with the next matching substring. Each time they match your capture group, they're storing that information again. Okay, I'm off the soapbox now. :-)
So, we're almost there. I say "almost" because I don't have all the information. But if this should be the sole occupant of the "subject" (line, field, etc.--the data sample you're evaluating), you should anchor it to "assert" that requirement. The caret '^' is beginning of subject, and the dollar '$' is end of subject, so by encapsulating our expression in ^ ... $ we are asserting that the subject matches in it's entirety, front-to-back. These assertions have zero-length; they consume no data, only assert a relative position. You can operate on them, e.g., s/^/ / would indent your entire document two spaces. You haven't really substituted the beginning of line with two spaces, but you're able to operate on that imaginary, zero-length location. (Do some research on zero-length assertions [aka zero-width assertions, or look-arounds] to uncover a powerful feature of modern regex. For example, in the previous regex if I wanted to make sure I did not insert two spaces on blank lines: s/^(?!$)/ /)
Also, you didn't say if you need to capture the results to do something with it. My impression was it's validation only, so that's not necessary. However, if it is needed, you can wrap the entire expression in capturing parenthesis: ^( ... )$.
I'm going to provide a final solution that does not assume you need to capture but does assume the entire subject should consist of this value:
^([a-zA-Z 0-9. -]{1,5})(?:,(?1))*$
I know I went on a bit, but you said you were new to regex, so wanted to provide some detail. I hope it wasn't too much detail.
By the way, an excellent resource with tutorials is regular-expressions dot info, and a wonderful regex development and testing tool is regex101 dot com. And I can never say enough about stack overflow!

Regular Expression to not allow disposable email addresses

I'm trying to create a regex that does not allow disposable email addresses but allows everything else. So far, here is what I have:
^[A-Za-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[A-Za-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(((?:[A-Za-z0-9](?:[A-Za-z0-9-]*[A-Za-z0-9])?\.)+[A-Za-z0-9](?:[A-Za-z0-9-]*[A-Za-z0-9]))(?=.*(?!.*mailinator.com))(?=.*(?!.*trbvm.com))(?=.*(?!.*guerrillamail.com))(?=.*(?!.*guerrillamailblock.com))(?=.*(?!.*sharklasers.com))(?=.*(?!.*guerrillamail.net))(?=.*(?!.*guerrillamail.org))(?=.*(?!.*guerrillamail.biz))(?=.*(?!.*spam4.me|grr.la))(?=.*(?!.*guerrillamail.de))(?=.*(?!.*grandmasmail.com))(?=.*(?!.*zetmail.com))(?=.*(?!.*vomoto.com))(?=.*(?!.*abyssmail.com))(?=.*(?!.*anappthat.com))(?=.*(?!.*eelmail.com))(?=.*(?!.*yopmail.com))(?=.*(?!.*fakeinbox.com)))$
Right now, it accepts all email addresses.
Try this slightly modified regex using lookbehind:
^[A-Za-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[A-Za-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(((?:[A-Za-z0-9](?:[A-Za-z0-9-]*[A-Za-z0-9])?\.)+[A-Za-z0-9](?:[A-Za-z0-9-]*[A-Za-z0-9]))(?<!mailinator\.com)(?<!trbvm\.com)(?<!guerrillamail\.com)(?<!guerrillamailblock\.com)(?<!sharklasers\.com)(?<!guerrillamail\.net)(?<!guerrillamail\.org)(?<!guerrillamail\.biz)(?<!spam4\.me)(?<!grr\.la)(?<!guerrillamail\.de)(?<!grandmasmail\.com)(?<!zetmail\.com)(?<!vomoto\.com)(?<!abyssmail\.com)(?<!anappthat\.com)(?<!eelmail\.com)(?<!yopmail\.com)(?<!fakeinbox\.com))$
It matches bob#gmail.com but does not match bob#mailinator.com.
Fundamentally, you had a regex to match any email address, followed by positive and negative lookaheads like (?=.*(?!.*mailinator.com)). By the time those lookaheads are executed, you're already at the end of the string (further enforced by the $).
Looking ahead from the end of the string there is… nothing. Any lookahead (positive or negative) into nothingness will either always pass, or always fail, regardless of the input string. E.g. a lookahead of (?=.*) at the end of a string will always pass (.* matches the empty string), whereas one of (?=.) will always fail (. does not match the empty string).
In your case, the lookaheads like (?=.*(?!.*mailinator.com)) are okay with the nothingness beyond the end of the input string, so always pass. It's identical to if you didn't have them in the regex at all.
The simple fix, without overhauling the regex entirely, is to look behind with the (?<!) construct, instead of ahead. You're at the end of the string, and want to ensure it didn't end with one of the disposable email domains you have listed. To do that for one domain, it would be (?<!mailinator\.com).
There are many disposable email domains and they are constantly changing. Writing a regex for them is only going to capture a small number and will require constant maintenance and updating.
You may want to look at using some open source lists eg. https://github.com/disposable/disposable and then build a way to update them.
Alternatively you can use something like Upollo's free tier which does this for you.