I'm looking to match Twitter syntax with a regex.
How can I match anything that is "#______" that is, begins with an # symbol, and is followed by no spaces, just letters and numbers until the end of the word? (To tweeters, I want to match someone's name in a reply)
Go for
/#(\w+)/
to get the matching name extracted as well.
#\w+
That simple?
It should be noted that Twitter no longer allows usernames longer than 15 characters, so you can also match with:
#\w{1,15}
There are still apparently a few people with usernames longer than 15 characters, but testing on 15 would be better if you want to exclude likely false positives.
There are apparently no rules regarding whether underscores can be used the the beginning or end of usernames, multiple underscores, etc., and there are accounts with single-letter names, as well as someone with the username "_".
#[\d\w]+
\d for a digit character
\w for a word character
[] to denote a character class
+ to represent more than one instances of the character class
Note that these specifiers for word and digit characters are language dependent. Check the language specification to be sure.
There is a very extensive API for how to get valid twitter names, mentions, etc. The Java version of the API provided by Twitter can be found on github twitter-text-java. You may want to take a look at it to see if this is something you can use.
I have used it to validate Twitter names and it works very well.
Related
For writing a parser I need to be able to identify keywords which can be abbreviated,
for example
MY-KEYWORD
should at least be MY-KEY but can also be any abbreviation longer than this, here specifically MY-KEYW, MY-KEYWO, MY-KEYWOR or the full MY-KEYWORD.
For the life of me, no regex I tried so far (and that were many ...) matches exact substrings of something with a minimum length :-(
TIA !
Alex
Match the prefix and then optional characters after it to finish the full keyword.
\bMY-KEY(?:W(?:O(?:RD?)?)?\b
All the groups are needed to ensure that no optional letters are skipped. If you wrote MY-KEYW?O?R?D it would match MY-KEYD.
I'm trying to write a regex pattern to validate Unique Transaction Identifiers (UTI). See description: here
The UTI consists of two concatenated parts, the prefix and the transaction identifier. Here is a summary of the rules I'm trying to take into account:
The prefix is exactly 10 alphanumeric characters.
The transaction identifier is 1-32 characters long.
The transaction identifier is alphanumeric, however the following special characters are also allowed: . : _ -
The special characters can not appear at the beginning or end of the transaction identifier.
It is not allowed to have two special characters in a row.
I have so far constructed a pattern to validate the UTI for the first 4 of these points (matched with ignored casing):
^[A-Z0-9]{11}((\w|[:\.-]){0,30}[A-Z0-9])?$
However I'm struggling with the last point (no two special characters in a row). I readily admit to being a bit of a novice when it comes to regex and I was thinking there might be some more advanced technique that I'm not familiar with to accomplish this. Any regex gurus out there care to enlighten me?
Solved: Thanks to user Bohemian for helping me find the pattern I was looking for. My final solution looks like this:
^[a-zA-Z0-9]{11}((?!.*[.:_-]{2})[a-zA-Z0-9.:_-]{0,30}[a-zA-Z0-9])?$
I will leave the question open for follow-up answers in case anyone has any further suggestions for improvements.
Try this:
^[A-Z0-9]{11}(?!.*[.:_-]{2})[A-Z0-9.:_-]{0,30}[A-Z0-9]$
The secret sauce is the negative look ahead (?!.*[.:_-]{2}), which asserts (without consuming input) that the following text does not contain 2 consecutive "special" chars .:_-.
Note that your attempt, which uses \w, allows lowercase letters and underscores too, because
\w is the same as [a-zA-Z0-9_]
So I've got /.+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]\#[\w+-?]+(.{1})\w{2,}/ pattern I want to use for email validation on client-side, which doesn't work as expected.
I know that my pattern is simple and doesn't cover every standard possibility, but it's part of my regex training.
Local part of address should be valid only when it has at least one digit [0-9] or letter [a-zA-Z] and can be mixed with comma or plus sign or underscore (or all at once) and then # sign, then domain part, but no IP address literals, only domain names with at least one letter or digit, followed by one dot and at least two letters or two digits.
In test string form it doesn't validate a#b.com and does validate baz_bar.test+private#e-mail-testing-service..com, which is wrong - it should be vice versa - validate a#b.com and not validate baz_bar.test+private#e-mail-testing-service..com
What specific error I've got there and where?
I can't locate this, sorry..
You need to change your regex
From: .+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]\#[\w+-?]+(\.{1})\w{2,}
To: .+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]?\#[\w+-]+(\.{1})\w{2,}
Notice that I added a ? before the # sign and removed the ? from the first "group" after the # sign. Adding that ? will make your regex to know that hole "group" is not mandatory.
See it working here: https://regex101.com/r/iX5zB5/2
You're requiring the local part (before #) to be at least two characters with the .+ followed by the character class [^...]. It's looking for any character followed by another character not in the list of exclusions you specify. That explains why "a#b.com" doesn't match.
The second problem is partly caused by the character class range +-? which includes the . character. I think you wanted [-\w+?]+. (Do you really want question marks?) And then later I think you wanted to look for a literal . character but it really ends up matching the first character that didn't match the previous block.
Between the regex provided and the explanatory text I'm not sure what rules you intend to implement though. And since this is an exercise it's probably better to just give hints anyway.
You will also want to use the ^ and $ anchors to makes sure the entire string matches.
I am working with ASP.NET MVC 5 application in which I want to add dataannotation validation for Name field.
That should accept any combination of number,character and under score only.
I tried by this but not working :
RegularExpression("([a-zA-Z0-9_ .&'-]+)", ErrorMessage = "Invalid.")]
Try this regex written under the regexr.com site.
Criteria - alphanumeric,underscore and space.
http://regexr.com/3agii
([a-zA-Z0-9_\s]+)
You are using a character class, that is the thing between the square brackets ([a-zA-Z0-9_ .&'-]). Within that square brackets you can define all characters that should be matched by this class. So, now it is easy: you allow characters you don't want to match.
Based on your "try" you could change this to
[a-zA-Z0-9_]
that seem to be the characters you want to match. But is it really what you need? Are that really the only characters that are possible for that field?
If yes then you are done.
If no, you probably want to add all characters of all languages. Luckily there is a Unicode property for that:
\p{L} All letter characters
There is another predefined group that could be useful for you:
\w matches any word character (The definition can also be found in the first link, includes the Unicode categories Ll,Lu,Lt,Lo,Lm,Nd,Pc, that is basically [a-zA-Z0-9_] but Unicode style with all letters and more connecting characters)
But still, if you want to match real names this will not cover all possible names. I have another answer on this topic here
I am weak in regex but I am learning. Currently I have a requirement to validate name and I am not able to write a valid regex for it. A valid name would contain alphabet only or alphabet with hyphens or spaces.
Example of valid name would be
jones
jones-smiht
a loreal jones
but if the name contains digits it's an invalid name. The following regex
^[-\\sa-zA-Z]+$ works fine but only - is also considered as a valid name.
How do I modify it so that a valid name must contain letters regardless or whether it contains hyphens and spaces?
I think you're looking for this regex:
^[a-zA-Z][-\\sa-zA-Z]*$
This will make sure your name always starts with a letter instead of starting with hyphen or space.
Note: In Java you can also make use of (?i) for ignore case and shorten your regex as follows:
(?i)^[a-z][-\\sa-z]*$
The literal answer for you would be ^[a-zA-Z][-\sa-zA-Z]*$.
There are better answers: for instance,
([a-zA-Z]+)([-\s][a-zA-Z]+)*
will allow any number of words separated by single space or dash, allowing for simon peyton-jones, but disallowing silliness like --jumbo-spaz--.
And copied from the response I tried to publish on the deleted answer:
Regexp is single-backslash. However, since regexps are constructed from strings in Java, you need to escape the backslash; but it is the feature of strings, not of regexps. So, regexp is \s, but you need to write Pattern.compile("\\s") in Java. Not all languages have this twist, so keeping rules of strings separate from what Regexp is is useful.