Trouble rejecting a specific character in my RegEx - regex

I'm running the following regular expression to check a username:
^(?=.*[a-zA-Z0-9])\w{2,25}\s*$
It works fine but now I need to amend it to reject any instances of underscores(_). I've tried wedging ^(?!_)$ in there but it doesn't seem to work for me in that it either checks at the beginning or the end.
I know a little about regular expressions but I'm hazy on all the classes. I've found a great resource for it at http://www.regular-expressions.info/reference.html
Thanks for the help, folks.

This should work for you:
[a-zA-Z][a-zA-Z0-9.\-]{2,25}\s*$
What this regex will validate:
The first character is a letter
The input contains only alphanumeric characters (i added - also)
if dont want - just remove \-
The input is 2-25 characters long

Well, you could always remove the \w by its character class excluding _.
^(?=.*[a-zA-Z0-9])[A-Za-z0-9]{2,25}\s*$

Related

Custom email validation regex pattern not working properly

So I've got /.+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]\#[\w+-?]+(.{1})\w{2,}/ pattern I want to use for email validation on client-side, which doesn't work as expected.
I know that my pattern is simple and doesn't cover every standard possibility, but it's part of my regex training.
Local part of address should be valid only when it has at least one digit [0-9] or letter [a-zA-Z] and can be mixed with comma or plus sign or underscore (or all at once) and then # sign, then domain part, but no IP address literals, only domain names with at least one letter or digit, followed by one dot and at least two letters or two digits.
In test string form it doesn't validate a#b.com and does validate baz_bar.test+private#e-mail-testing-service..com, which is wrong - it should be vice versa - validate a#b.com and not validate baz_bar.test+private#e-mail-testing-service..com
What specific error I've got there and where?
I can't locate this, sorry..
You need to change your regex
From: .+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]\#[\w+-?]+(\.{1})\w{2,}
To: .+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]?\#[\w+-]+(\.{1})\w{2,}
Notice that I added a ? before the # sign and removed the ? from the first "group" after the # sign. Adding that ? will make your regex to know that hole "group" is not mandatory.
See it working here: https://regex101.com/r/iX5zB5/2
You're requiring the local part (before #) to be at least two characters with the .+ followed by the character class [^...]. It's looking for any character followed by another character not in the list of exclusions you specify. That explains why "a#b.com" doesn't match.
The second problem is partly caused by the character class range +-? which includes the . character. I think you wanted [-\w+?]+. (Do you really want question marks?) And then later I think you wanted to look for a literal . character but it really ends up matching the first character that didn't match the previous block.
Between the regex provided and the explanatory text I'm not sure what rules you intend to implement though. And since this is an exercise it's probably better to just give hints anyway.
You will also want to use the ^ and $ anchors to makes sure the entire string matches.

RegEx to match acronyms

I am trying to write a regular expression that will match values such as U.S., D.C., U.S.A., etc.
Here is what I have so far -
\b([a-zA-Z]\.){2,}+
Note how this expression matches but does not include the last letter in the acronym.
Can anyone help explain what I am missing here?
SOLUTION
I'm posting the solution here in case this helps anyone.
\b(?:[a-zA-Z]\.){2,}
It seems as if a non-capturing group is required here.
Try (?:[a-zA-Z]\.){2,}
?: (non-capturing group) is there because you want to omit capturing the last iteration of the repeated group.
For example, without ?:, 'U.S.A.' will yield a group match 'A.', which you are not interested about.
None of these proposed solutions do what yours does - make sure that there are at least 2 letters in the acronym. Also, yours works on http://rubular.com/ . This is probably some issue with the regex implementation - to be fair, all of the matches that you got were valid acronyms. To fix this, you could either:
Make sure there's a space or EOF succeeding your expression ((?=\s|$) in ruby at least)
Surround your regex with ^ and $ to make sure it catches the whole string. You'd have to split the whole string on spaces to get matches with this though.
I prefer the former solution - to do this you'd have:
\b([a-zA-Z]\.){2,}(?=\s|$)
Edit: I've realized this doesn't actually work with other punctuation in the string, and a couple of other edge cases. This is super ugly, but I think it should be good enough:
(?<=\s|^)((?:[a-zA-Z]\.){2,})(?=[[:punct:]]?(?:\s|$))
This assumes that you've got this [[:punct:]] character class, and allows for 0-1 punctuation marks after an acronym that won't be captured. I've also fixed it up so that there's a single capture group that gets the whole acronym. Check out validation at http://rubular.com/r/lmr0qERLDh
Bonus: you now get to make this super confusing to anyone reading it.
This should work:
/([a-zA-Z]\.)+/g
I have slightly modified the solution above:
\b(?:[a-zA-Z]+\.){2,}
to enable capturing acronyms containing more than one letter between the dots, like in 'GHQ.AFP.X.Y'

Regex to certain special characters

Currently i have this following regex which i use to validate the name of a company/industry and its working fine
/(?=[a-zA-Z0-9-]{5,25}$)^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$/
The above regex doesnt supports for special characters like & - . _ which are valid in my case
I came up with this but it wasnt working as expected.
/(?=[a-zA-Z0-9-\&\_\.]{5,25}$)^[a-zA-Z0-9\&\_\.]+(-[a-zA-Z0-9\&\_\.]+)*$/
Can someone point it out where my above regex goes wrong. Also a short explaination of the above regex wud be greatly appreciated
Thanks
I don't think you have to escape & with \&, same way _ also
/(?=[a-zA-Z0-9-&_\.]{5,25}$)^[a-zA-Z0-9&_\.]+(-[a-zA-Z0-9&_\.]+)*$/
If I'm not wrong, you don't actually have to put backslash with every special character unless the special character is the backslash itself or the character -. So your regular expression would be
/(?=[a-zA-Z0-9-&_.]{5,25}$)^[a-zA-Z0-9&_.]+(-[a-zA-Z0-9&_.]+)*$/

RegExp extraction

Here's the input string:
loadMedia('mediacontainer1', 'http://www.something.com/videos/JohnsAwesomeVideo.flv', 'http://www.something.com/videos/JohnsAwesomeCaption.xml', '/videos/video-splash-image.gif)
With this RegExp: \'.+.xml\'
... we get this:
'mediacontainer1', 'http://www.something.com/videos/JohnsAwesomeVideo.flv', 'http://www.something.com/videos/JohnsAwesomeCaption.xml'
... but I want to extract only this:
http://www.something.com/videos/JohnsAwesomeCaption.xml
Any suggestions? I'm sure this problem has been asked before, but it's difficult to search for. I'll be happy to Accept a solution.
Thanks!
If you want to get everything within quotes that starts with http:
(?<=')http:[^']+(?=')
If you only want those ending with .xml
(?<=')http:[^']+\.xml(?=')
It doesn't select the quotation marks (as you asked)
It's fast!
Fair warning: it only works if the regex engine you're using can handle lookbehind
Knowing the language would be helpful. Basically, you are having a problem because the + quantifier is greedy, meaning it will match the largest part of the string that it can. you need to use a non-greedy quantifier, which will match as little as possible.
We will need to know the language you're in to know what the syntax for the non-greedy quantifier should be.
Here is a perl recipe. Just as a sidenote, instead of .+, you probably want to match [^.]+.xml.
\'.+?.xml\'
should work if your language supports perl-like regexes.
This should work (tested in javascript, but pretty sure it would work in most cases)
'[^']+?\.xml'
it looks for these rules
starts with '
is followed by anything but '
ends in .xml'
you can demo it at http://RegExr.com?2tp6q
in .net this regex works for me:
\'[\w:/.]+\.xml\'
breaking it down:
a ' character
followed by a word character or ':' or '/' or '.' any number of times (which matches the url bit)
followed by '.xml' (which differentiates the sought string from the other urls which it will match without this)
followed by another ' character
I tested it here
Edit
I missed that you don't want the quotes in the result, in which case as has been pointed out you need to use look behind and look ahead to include the quotes in the search, but not in the answer. again in .net:
(?<=')[\w:/.]+\.xml(?=')
but I think the best solution is a combination of those offered already:
(?<=')[^']+\.xml(?=')
which seems the simplest to read, at least to me.

Password validation regex

I am trying to get one regular expression that does the following:
makes sure there are no white-space characters
minimum length of 8
makes sure there is at least:
one non-alpha character
one upper case character
one lower case character
I found this regular expression:
((?=.*[^a-zA-Z])(?=.*[a-z])(?=.*[A-Z])(?!\s).{8,})
which takes care of points 2 and 3 above, but how do I add the first requirement to the above regex expression?
I know I can do two expressions the one above and then
\s
but I'd like to have it all in one, I tried doing something like ?!\s but I couldn't get it to work. Any ideas?
^(?=.*[^a-zA-Z])(?=.*[a-z])(?=.*[A-Z])\S{8,}$
should do. Be aware, though, that you're only validating ASCII letters. Is Ä not a letter for your requirements?
\S means "any character except whitespace", so by using this instead of the dot, and by anchoring the regex at the start and end of the string, we make sure that the string doesn't contain any whitespace.
I also removed the unnecessary parentheses around the entire expression.
Tim's answer works well, and is a good reminder that there are many ways to solve the same problem with regexes, but you were on the right track to finding a solution yourself. If you had changed (?!\s) to (?!.*\s) and added the ^ and $ anchors to the end, it would work.
^((?=.*[^a-zA-Z])(?=.*[a-z])(?=.*[A-Z])(?!.*\s).{8,})$