Idiot-proofing Excel fields of MAC Addresses and Phone Numbers - regex

I was tasked with making an Excel spreadsheet where MAC addresses and directory numbers would be added later. The task, though, was to idiot-proof it somehow.
That is, for the MAC addresses:
Only allow the characters 0 - 9 and a - f
Must have 12 characters
and for the directory number
- 10 digits
If any of these criteria fail, display an error.
I've been trying to play with regex and data validation, and I'm just not having any luck. I've been googling any combination of excel mac address regex limiting as terms and not gotten much further.
TL;DR I need to check whether entered phone numbers and MAC addresses are properly formatted when entered in a cell.

For the MAC address: [0-9a-f]{12}
For the directory number: \d{10}
See this answer by allquixotic for how to use these patterns in Excel, but substitute the correct regex.
As the comments mentioned though, idiot proof is going to be an unreachable goal.

Related

Validate Street Address Format

I'm trying to validate the format of a street address in Google Forms using regex. I won't be able to confirm it's a real address, but I would like to at least validate that the string is:
[numbers(max 6 digits)] [word(minimum one to max 8 words with
spaces in between and numbers and # allowed)], [words(minimum one to max four words, only letters)], [2
capital letters] [5 digit number]
I want the spaces and commas I left in between the brackets to be required, exactly where I put them in the above example. This would validate
123 test st, test city, TT 12345
That's obviously not a real address, but at least it requires the entry of the correct format. The data is coming from people answering a question on a form, so it will always be just an address, no names. Plus they're all address is one area South Florida, where pretty much all addresses will match this format. The problem I'm having is people not entering a city, or commas, so I want to give them an error if they don't. So far, I've found this
^([0-9a-zA-Z]+)(,\s*[0-9a-zA-Z]+)*$
But that doesn't allow for multiple words between the commas, or the capital letters and numbers for zip. Any help would save me a lot of headaches, and I would greatly appreciate it.
There really is a lot to consider when dealing with a street address--more than you can meaningfully deal with using a regular expression. Besides, if a human being is at a keyboard, there's always a high likelihood of typing mistakes, and there just isn't a regex that can account for all possible human errors.
Also, depending on what you intend to do with the address once you receive it, there's all sorts of helpful information you might need that you wouldn't get just from splitting the rough address components with a regex.
As a software developer at SmartyStreets (disclosure), I've learned that regular expressions really are the wrong tool for this job because addresses aren't as 'regular' (standardized) as you might think. There are more rigorous validation tools available, even plugins you can install on your web form to validate the address as it is typed, and which return a wealth of of useful metadata and information.
Try Regex:
\d{1,6}\s(?:[A-Za-z0-9#]+\s){0,7}(?:[A-Za-z0-9#]+,)\s*(?:[A-Za-z]+\s){0,3}(?:[A-Za-z]+,)\s*[A-Z]{2}\s*\d{5}
See Demo
Accepts Apt# also:
(^[0-9]{1,5}\s)([A-Za-z]{1,}(\#\s|\s\#|\s\#\s|\s)){1,5}([A-Za-z]{1,}\,|[0-9]{1,}\,)(\s[a-zA-Z]{1,}\,|[a-zA-Z]{1,}\,)(\s[a-zA-Z]{2}\s|[a-zA-Z]{2}\s)([0-9]{5})

Regex structure to identify name(s) before key word

I am trying to write an expression to identify station locations within a sentence in knowledge studio (IBM Watson).
At the moment I have
[^a-z][^\s]*(.*?)\s+station|Station
but it is causing me some problems:
1. It is extracting the whole line rather than just the station (e.g. "Please meet at Angel Station" is extracted rather than just "Angel Station").
2. I can't seem to find how to write an exception within an expression. For example, I would usually want to find all words before station that are not lower case (uppercase, titlecase or numerical), but if it is and then I want it to continue identifying words (e.g. Highbury and Islington Station, not just select Islington station).
Please advise on what I am doing wrong. Thanks!
The answer I think is IBM Watson Knowledge Studio specific - you have to define a specific number of word tokens outside of the regex structure - by default this is limited to 5 so needed to be increased to pick up all of the words correctly. I increased this to 10 which work fine for my purpose.
In terms of then the correct structure the below worked:
\b[A-Z][A-Za-z']*(?:\s+(?:and|[A-Z][A-Za-z']*))*\s+[Ss]tation
Note - I needed to include the ' symbol to ensure all stations were picked up (e.g. King's Cross Station).
Oak Lane Station is still not selecting, but this seems to be a bug rather than an issue with the Regex so have reported it to the IBM Watson team.

Can't Get Regex To Work in uBot - Extracting Phone Numbers

Got a block of text I'm trying to pull phone numbers out of.
for example:
Phone Numbers
Any phone numbers that Angelo may currently or previously have used
are displayed below. Run a phone report on a particular number for
more information.
(555) 444-5555 (555) 555-7777 Not seeing something? Access additional
data sources. Accessing premium data sources may reveal hard to find
phone numbers like cell phones
the regex code I wrote to extract the numbers is
.?\d{3}.?\s\d{3}.\d{4}
For whatever reason, the results turn back blank and I'm not sure why. I've tested this regex code inside a uBot Expresion Checker and it pulls the phone numbers out as it should. But once I enter it in uBot it pulls nothing.
Any help? Thanks
FIGURED IT OUT:
.*\d3.?\s\d{3,5}.\d{3,5}
for whatever reason uBot wouldn't display the phone numbers correctly until I had the above worked out.

SQL Server Regular Expression Workaround in T-SQL?

I have some SQLCLR code for working with Regular Expresions. But now that it is getting migrated into Azure, which does not allow SQLCLR, that's out. I need to find a way to do regex in pure T-SQL.
Master Data Services are not available because the dev edition of MSSQL we have is not R2.
All ideas appreciated, thanks.
Regular expression match samples that need handling
(culled from regexlib and other places over the past few years)
email address
^[\w-]+(\.[\w-]+)*#([a-z0-9-]+(\.[a-z0-9-]+)*?\.[a-z]{2,6}|(\d{1,3}\.){3}\d{1,3})(:\d{4})?$
dollars
^(\$)?(([1-9]\d{0,2}(\,\d{3})*)|([1-9]\d*)|(0))(\.\d{2})?$
uri
^(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*#)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*$
one numeric digit
^\d$
percentage
^-?[0-9]{0,2}(\.[0-9]{1,2})?$|^-?(100)(\.[0]{1,2})?$
height notation
^\d?\d'(\d|1[01])"$
numbers between 1 1000
^([1-9]|[1-9]\d|1000)$
credit card numbers
^((4\d{3})|(5[1-5]\d{2})|(6011))-?\d{4}-?\d{4}-?\d{4}|3[4,7]\d{13}$
list of years
^([1-9]{1}[0-9]{3}[,]?)*([1-9]{1}[0-9]{3})$
days of the week
^(Sun|Mon|(T(ues|hurs))|Fri)(day|\.)?$|Wed(\.|nesday)?$|Sat(\.|urday)?$|T((ue?)|(hu?r?))\.?$
time on 12 hour clock
(?<Time>^(?:0?[1-9]:[0-5]|1(?=[012])\d:[0-5])\d(?:[ap]m)?)
time on 24 hour clock
^(?:(?:(?:0?[13578]|1[02])(\/|-|\.)31)\1|(?:(?:0?[13-9]|1[0-2])(\/|-|\.)(?:29|30)\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:0?2(\/|-|\.)29\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:(?:0?[1-9])|(?:1[0-2]))(\/|-|\.)(?:0?[1-9]|1\d|2[0-8])\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$
usa phone numbers
^\(?[\d]{3}\)?[\s-]?[\d]{3}[\s-]?[\d]{4}$
Unfortunately, you will not be able to move your CLR function(s) to SQL Azure. You will need to either use the normal string functions (PATINDEX, CHARINDEX, LIKE, and so on) or perform these operations outside of the database.
EDIT Adding some information for the examples added to the question.
Email address
This one is always controversial because people disagree about which version of the RFC they want to support. The original didn't support apostrophes, for example (or at least people insist that it didn't - I haven't dug it up from the archives and read it myself, admittedly), and it has to be expanded quite often for new TLDs (once for 4-letter TLDs like .info, then again for 6-letter TLDs like .museum). I've often heard quite knowledgeable people state that perfect e-mail validation is impossible, and having previously worked for an e-mail service provider, I can tell you that it was a constantly moving target. But for the simplest approaches, see the question TSQL Email Validation (without regex).
One numeric digit
Probably the easiest one of the bunch:
WHERE #s LIKE '[0-9]';
Credit card numbers
Assuming you strip out dashes and spaces, which you should do in any case. Note that this isn't an actual check of the credit card number algorithm to ensure that the number itself is actually valid, just that it conforms to the general format (AmEx = 15 digits starting with a 3, the rest are 16 digits - Visa starts with a 4, MasterCard starts with a 5, Discover starts with 6 and I think there's one that starts with a 7 (though that may just be gift cards of some kind)):
WHERE #s + ' ' LIKE '[3-7]'+ REPLICATE('[0-9]', 14) + '[0-9 ]';
If you want to be a little more precise at the cost of being long-winded, you can say:
WHERE (LEN(#s) = 15 AND #s LIKE '3' + REPLICATE('[0-9]', 14))
OR (LEN(#s) = 16 AND #s LIKE '[4-7]' + REPLICATE('[0-9]', 15));
USA phone numbers
Again, assuming you're going to strip out parentheses, dashes and spaces first. Pretty sure a US area code can't start with a 1; if there are other rules, I am not aware of them.
WHERE #s LIKE '[2-9]' + REPLICATE('[0-9]', 9);
-----
I'm not going to go further, because a lot of the other expressions you've defined can be extrapolated from the above. Hopefully this gives you a start. You should be able to Google for some of the others to see how other people have replicated the patterns with T-SQL. Some of them (like days of the week) can probably just be checked against a table - seems overkill to do an invasie pattern matching for a set of 7 possible values. Similarly with a list of 1000 numbers or years, these are things that will be much easier (and probably more efficient) to check if the numeric value is in a table rather than convert it to a string and see if it matches some pattern.
I'll state again that a lot of this will be much better if you can cleanse and validate the data before it gets into the database in the first place. You should strive to do this wherever possible, because without CLR, you just can't do powerful RegEx inside SQL Server.
Ken Henderson wrote about ways to replicate RegEx without CLR, but they require sp_OA* procedures, which are even less likely to ever see the light of day in Azure than CLR. Most of the other articles you'll find online use an approach similar to Ken's or use complex use of built-in string functions.
Which portions of RegEx specifically are you trying to replicate? Can you show an example of the input/output of one of your functions? Perhaps it will be easy to convert to get similar results using the built-in string functions like PATINDEX.

where can i get a regex or a library package for recognizing street address, postal code, state, phone numbers, emails and etc?

i have bunch of unformatted docs....
i need regex to capture street address, postal code, state, phone numbers, emails, such common formats...
This site offers a searchable library of regexs: and this regular expression cookbook contains hundreds of examples of regex matching patterns
In the case of street addresses and to a certain extent, postal codes, regexs can only go so far. As a matter of fact, trying to regex a street is essentially impossible because of the huge variety of formats for a street address--even from within the United States.
A regex that has worked rather well for strictly formatted US-based postal codes is: ^\d{5}([-+]?\d{4})?$
In the US, ZIP Codes are typically formatted as follows:
12345
123456789
12345-6789
12345+6789 12345-67ND (yes, you read that right, sometimes the last two can be "ND")
The other issue that you'll have is when a zero-prefixed ZIP such as one from New England has been run through Excel and it has removed the leading zero, leaving a four-digit number. This is why a regex alone can't get the job done 100% even for something as "simple" as a US-based ZIP Code.
Depending upon the business needs, you'll want to investigate an address verification solution. Any online provider worth their salt can standardize and verify and address which tells you if the address is real and can help reduce fraud and return shipping, etc.
In the interest of full disclosure, I'm the founder of SmartyStreets. We have an online address verification service which cleans, standardizes, and validates addresses. You're more than welcome to contact me personally for any questions you have.