Regex to validate Unique Transaction Identifier - regex

I'm trying to write a regex pattern to validate Unique Transaction Identifiers (UTI). See description: here
The UTI consists of two concatenated parts, the prefix and the transaction identifier. Here is a summary of the rules I'm trying to take into account:
The prefix is exactly 10 alphanumeric characters.
The transaction identifier is 1-32 characters long.
The transaction identifier is alphanumeric, however the following special characters are also allowed: . : _ -
The special characters can not appear at the beginning or end of the transaction identifier.
It is not allowed to have two special characters in a row.
I have so far constructed a pattern to validate the UTI for the first 4 of these points (matched with ignored casing):
^[A-Z0-9]{11}((\w|[:\.-]){0,30}[A-Z0-9])?$
However I'm struggling with the last point (no two special characters in a row). I readily admit to being a bit of a novice when it comes to regex and I was thinking there might be some more advanced technique that I'm not familiar with to accomplish this. Any regex gurus out there care to enlighten me?
Solved: Thanks to user Bohemian for helping me find the pattern I was looking for. My final solution looks like this:
^[a-zA-Z0-9]{11}((?!.*[.:_-]{2})[a-zA-Z0-9.:_-]{0,30}[a-zA-Z0-9])?$
I will leave the question open for follow-up answers in case anyone has any further suggestions for improvements.

Try this:
^[A-Z0-9]{11}(?!.*[.:_-]{2})[A-Z0-9.:_-]{0,30}[A-Z0-9]$
The secret sauce is the negative look ahead (?!.*[.:_-]{2}), which asserts (without consuming input) that the following text does not contain 2 consecutive "special" chars .:_-.
Note that your attempt, which uses \w, allows lowercase letters and underscores too, because
\w is the same as [a-zA-Z0-9_]

Related

RegExp space character

I have this regular expression: ^[a-zA-Z]\s{3,16}$
What I want is to match any name with any spaces, for example, John Smith and that contains 3 to 16 characters long..
What am I doing wrong?
Background
There are a couple of things to note here. First, a quantifier (in this case, {3,16}) only applies to the last regex token. So what your current regex really is saying is to "Match any string that has a single alphabetical character (case-insensitive) followed by 3 to 16 whitespace characters (e.g. spaces, tabs, etc.)."
Second, a name can have more than 2 parts (a middle name, certain ethnic names like "De La Cruz") or include special characters such as accented vowels. You should consider if this is something you need to account for in your program. These things are important and should be considered for any real application.
Assumptions and Answer
Now, let's just assume you only want a certain format for names that consists of a first name, a last name, and a space. Let's also assume you only want simple ASCII characters (i.e. no special characters or accented characters). Furthermore, both the first and last names should start with a capital character followed by only lower-case characters. Other than that, there are no restrictions on the length of the individual parts of the name. In this case, the following regex would do the trick:
^(?=.{3,16}$)[A-Z][a-z]+ [A-Z][a-z]+$
Notes
The first token after the ^ character is what is called a positive lookahead. Basically a positive look ahead will match the regex between the opening (?= and closing ) without actually moving the position of the cursor that is matching the string.
Notice I removed the \s token, since you usually want only a (space). The space can be replaced with the \s token, if tabs and other whitespace is desired there.
I also added a restriction that a name must start with a capital letter followed by only lower-case letters.
Crude English Translation
To help your understanding, here is a simple English translation of what the regex is really doing. The part in italics is just copied from the first part of the English translation of the regex.
"Match any string that has 3-16 characters and starts with a capital alphabetical character followed by one or more (+) alphabetical characters followed by a single space followed by a capital alphabetical character followed by one or more (+) alphabetical characters and ends with any lowercase letter."
Tools
There are a couple of tools I like to use when I am trying to tackle a challenging regex. They are listed below in no particular order:
https://regex101.com/ - Allows you to test regex expressions in real time. It also has a nifty little library to help you along.
http://www.regular-expressions.info/ - Basically a repository of knowledge on regex.
Edit/Update
You mentioned in your comments that you are using your regex in JavaScript. JavaScript uses a forward slash surrounding the regex to determine what is a regex. For this simple case, there are 2 options for using a regex to match a string.
First, use String's match method as follows
"John Smith".match(/^(?=.{3,16}$)[A-Z][a-z]+ [A-Z][a-z]+$/);
Second, create a regex and use its test() method. For example,
/^(?=.{3,16}$)[A-Z][a-z]+ [A-Z][a-z]+$/.test("John Smith");
The latter is probably what you want as it simply returns true or false depending on whether the regex actually matches the string or not.

Improving the below RegEx for US and UK Names

I'm looking to validate English names. How could I improve this RegEx in order to ensure that the hyphen, comma or apostrophe does not come first or last in the string? Further, a hyphen or apostrophe must always be directly preceded and succeeded with a letter.
I have the RegEx: [a-zA-Z-', ].
My understanding is that this will allow for all letters, in all cases, hyphens, commas and apostrophes along with whitespace at any location in the string.
Quite new to the language so just getting my head around it.
Thanks.
Ok, since it's only given a presumed name, here's what you might want to try:
^[A-Z][A-Za-z'-]+[a-z](,? [A-Z][A-Za-z'-]+[a-z])*$
This will work with names like O'Harry Jefferson-Wayne, but will reject words not ending with a small English letter.
The gist of it is this [A-Z] start of name, [A-Za-z'-]+ continuation, [a-z] end of name. then a repeatable group of optional names, with an optional comma delimiter, but required space.
I took more time writing the examples to test at http://www.regexplanet.com/. That's where I practice my regular expressions when I'm developing. You should try it.

Custom email validation regex pattern not working properly

So I've got /.+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]\#[\w+-?]+(.{1})\w{2,}/ pattern I want to use for email validation on client-side, which doesn't work as expected.
I know that my pattern is simple and doesn't cover every standard possibility, but it's part of my regex training.
Local part of address should be valid only when it has at least one digit [0-9] or letter [a-zA-Z] and can be mixed with comma or plus sign or underscore (or all at once) and then # sign, then domain part, but no IP address literals, only domain names with at least one letter or digit, followed by one dot and at least two letters or two digits.
In test string form it doesn't validate a#b.com and does validate baz_bar.test+private#e-mail-testing-service..com, which is wrong - it should be vice versa - validate a#b.com and not validate baz_bar.test+private#e-mail-testing-service..com
What specific error I've got there and where?
I can't locate this, sorry..
You need to change your regex
From: .+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]\#[\w+-?]+(\.{1})\w{2,}
To: .+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]?\#[\w+-]+(\.{1})\w{2,}
Notice that I added a ? before the # sign and removed the ? from the first "group" after the # sign. Adding that ? will make your regex to know that hole "group" is not mandatory.
See it working here: https://regex101.com/r/iX5zB5/2
You're requiring the local part (before #) to be at least two characters with the .+ followed by the character class [^...]. It's looking for any character followed by another character not in the list of exclusions you specify. That explains why "a#b.com" doesn't match.
The second problem is partly caused by the character class range +-? which includes the . character. I think you wanted [-\w+?]+. (Do you really want question marks?) And then later I think you wanted to look for a literal . character but it really ends up matching the first character that didn't match the previous block.
Between the regex provided and the explanatory text I'm not sure what rules you intend to implement though. And since this is an exercise it's probably better to just give hints anyway.
You will also want to use the ^ and $ anchors to makes sure the entire string matches.

What is wrong with my simple regex that accepts empty strings and apartment numbers?

So I wanted to limit a textbox which contains an apartment number which is optional.
Here is the regex in question:
([0-9]{1,4}[A-Z]?)|([A-Z])|(^$)
Simple enough eh?
I'm using these tools to test my regex:
Regex Analyzer
Regex Validator
Here are the expected results:
Valid
"1234A"
"Z"
"(Empty string)"
Invalid
"A1234"
"fhfdsahds527523832dvhsfdg"
Obviously if I'm here, the invalid ones are accepted by the regex. The goal of this regex is accept either 1 to 4 numbers with an optional letter, or a single letter or an empty string.
I just can't seem to figure out what's not working, I mean it is a simple enough regex we have here. I'm probably missing something as I'm not very good with regexes, but this syntax seems ok to my eyes. Hopefully someone here can point to my error.
Thanks for all help, it is greatly appreciated.
You need to use the ^ and $ anchors for your first two options as well. Also you can include the second option into the first one (which immediately matches the third variant as well):
^[0-9]{0,4}[A-Z]?$
Without the anchors your regular expression matches because it will just pick a single letter from anywhere within your string.
Depending on the language, you can also use a negative look ahead.
^[0-9]{0,4}[A-Za-z](?!.*[0-9])
Breakdown:
^[0-9]{0,4} = This look for any number 0 through 4 times at the beginning of the string
[A-Za-z] = This look for any characters (Both cases)
(?!.*[0-9]) = This will only allow the letters if there are no numbers anywhere after the letter.
I haven't quite figured out how to validate against a null character, but that might be easier done using tools from whatever language you are using. Something along this logic:
if String Doesn't equal $null Then check the Rexex
Something along those lines, just adjusted for however you would do it in your language.
I used RegEx Skinner to validate the answers.
Edit: Fixed error from comments

Regex match, quite simple:

I'm looking to match Twitter syntax with a regex.
How can I match anything that is "#______" that is, begins with an # symbol, and is followed by no spaces, just letters and numbers until the end of the word? (To tweeters, I want to match someone's name in a reply)
Go for
/#(\w+)/
to get the matching name extracted as well.
#\w+
That simple?
It should be noted that Twitter no longer allows usernames longer than 15 characters, so you can also match with:
#\w{1,15}
There are still apparently a few people with usernames longer than 15 characters, but testing on 15 would be better if you want to exclude likely false positives.
There are apparently no rules regarding whether underscores can be used the the beginning or end of usernames, multiple underscores, etc., and there are accounts with single-letter names, as well as someone with the username "_".
#[\d\w]+
\d for a digit character
\w for a word character
[] to denote a character class
+ to represent more than one instances of the character class
Note that these specifiers for word and digit characters are language dependent. Check the language specification to be sure.
There is a very extensive API for how to get valid twitter names, mentions, etc. The Java version of the API provided by Twitter can be found on github twitter-text-java. You may want to take a look at it to see if this is something you can use.
I have used it to validate Twitter names and it works very well.