Regex for city and street name - regex

Hi, I am looking for 2 regex which describe:
1) a valid name of a street
2) a valid name of a city
Valid street names are:
Mainstreet.
Mainstreet
Main Street
Big New mainstreet
Mainstreet-New
Mains Str.
St. Alexander Street
abcÜüßäÄöÖàâäèéêëîï ôœùûüÿçÀÂ-ÄÈÉÊËÎÏÔŒÙÛÜŸÇ.
John Kennedy Street
Not valid street names are:
Mainstreet #+;:_*´`?=)(/&%$§!
Mainstreet#+;:_*´`?=)(/&%$§!
Mainstreet 2
Mainstreet..
Mainstreet§
Valid cities are:
Edinôœùûüÿ
Berlin.
St. Petersburg
New-Berlin
Aue-Bad Schlema
Frankfurt am Main
Nürnberg
Ab
New York CityßäÄöÖàâäèéêëîïôœùûüÿçÀÂ-ÄÈÉÊËÎÏÔŒÙÛÜŸ
Not valid cities are:
Edingburgh 123
Edingburg123
St. Andrews 12
Berlin,#+;:_*´`?=)(/&%$§!
Berlin__
The solutions that I have at the moment matches very close but not perfectly:
For city and street name:
^[^\W\d_]+(?:[-\s][^\W\d_]+)*[.]?$
Unfortunately no match for these examples (the rest works fine):
St. Alexander Street
St. Petersburg
If you have more simple solutions, I am happy to learn sth. new! :-)

To make it match St. Alexander Street and St. Petersburg, you just need to add an optional dot after the letter matching patterns:
^[^\W\d_]+\.?(?:[-\s][^\W\d_]+\.?)*$
# ^^^ ^^^
See the regex demo.
Also, it might make sense to add a single apostrophe to the regex:
^[^\W\d_]+\.?(?:[-\s'’][^\W\d_]+\.?)*$
See the regex demo.

Related

Using Regex in SOLR Query

I have a data set of street names and numbers which I need to search.
eg. 12 HILL STREET
12A HILL STREET
12B HILL STREET
123 HILL STREET
12 HILARY STREET
If I search as follows q=(street_name:12\ HILL*), I get
12 HILL STREET
I want to obtain the following results:
12 HILL STREET
12A HILL STREET
12B HILL STREET
Is there a way to query in SOLR to return the results as the above example shows?
I have tried querying as:
q=(street_name:/12[A-Z]\ HILL*/)
but don't get anything back.
You can use
q=(street_name:/12[A-Z]* HILL.*/)
Here, the pattern means
12 - string starts with 12
[A-Z]* - zero or more ASCII uppercase letters
- a space
HILL - HILL char sequence
.* - any zero or more chars other than line break chars as many as possible (so, the rest of the line).

Regex - Creating validation to enforce that a string has 2+ words

If you have a moment, I need some help adding to my regex expression. I am validating a response in a Google Form for the user's full name.
The validation requires:
That only letters are used
That the user inputs both the first and second name (at a minimum), separated by a space
So far I have come up with:
[a-zA-Z ]+]
But this lacks the check for a minimum of two words in a given string.
After an hour of fails and googling, I have admitted defeat and need your help!
Thanks in advance.
This should do the job:
/^[a-z]{2,}( [a-z]+)*?( [a-z]{2,}){1,}$/i
It matches:
john smith ◄ all lowercase
John Smith
John P E Smith
John Paul E Smith
John Paul Eward Smith
It ignores:
John
John S
John Paul S
John Paul Edward S
J0hn Smith  ◄ zero instead of the letter 'o'
John     Smith  ◄ multiple spaces
You can play with this fiddle.
Best regards

Regex pattern in salesforce apex

I am new to regex.
I have a String formatted like below
Street Name
City, StateCode ZipNumber
for example, the string can be like
50 Connecticut Avenue
Norwalk, CT 06850
or
123 6th Avenue
New York, NY 10013
or
4TH Highway 6
Rule, TX 79547
I am trying to construct a regex here.
But cannot proceed as I have a little idea about regex.
Can you please help me?
The following might be enough :
^(?<Street>[^\n]+)\n(?<City>[^,]+), (?<StateCode>[A-Z]{2}) (?<Zip>\d+)$
It captures the following segments in different groups :
the first line in a group named Street
the part of the second line which precedes the comma in a group named City
the next two capital letters in a group named StateCode
the following digits in a group named Zip

SAS Regex code to capture Business Address from 10-K company filings

Consider the following EDGAR 10-K SEC Company Filing
https://www.sec.gov/Archives/edgar/data/912382/000136231009004179/0001362310-09-004179.txt
BUSINESS ADDRESS:
STREET 1: 107 N PENNSYLVANIA ST
STREET 2: STE 600
CITY: INDIANAPOLIS
STATE: IN
ZIP: 46204
BUSINESS PHONE: 3172619000
MAIL ADDRESS:
STREET 1: 107 N PENNSYLVANIA ST
STREET 2: STE 600
CITY: INDIANAPOLIS
STATE: IN
ZIP: 46204
I need a regex in SAS to capture the fields STREET 1, STREET 2, CITY, STATE and ZIP under the Business Address, but not the Mailing Address. For example for STREET 1, I use STREET\s2\s*(.*) in SAS, but it ends up capturing the STREET 1 for Mailing address. Thanks!
This regex should work.
BUSINESS ADDRESS:\s*STREET\s1:\s*(.*)\s*STREET\s2:\s*(.*)
You can continue the pattern to capture each section you need in a new parenthesis. Basically you're just making sure that you get the first answer after business address. The problem with the pattern you were using is that it was able to match the pattern in two separate locations, and the regex engine will only return the last match it finds. Therefore you have to put something in that specifies which one you want.
In SAS you can use the prxposn function with the second argument indicating the capture buffer (parenthesis) to retrieve. For example.
address1=prxposn(regex_pattern, 1, edgar10);
Best.

Regex matching in street name vs street type

Currently I have my regex to separate out an address into different groups. The issue I am facing is when no street type exists at the end of the address and a match occurs in the street name.
For example, included in my search for street type is the word "ridge". The address "123 Cambridge Bay" matches "camb" as the street name and "ridge" as the street type when really there shouldnt be a match for street type. Only street name as "Cambridge Bay"
Regex:
(\d+).*?((?:[a-z](?:[a-z]|[^\S\r\n])+)).*?((?:court|ct|street|st|drive|dr|lane|ln|road|rd|blvd|cir|trl|trail|crossing|xing|pl|place|ave|cv|cove|trce|trace|mnr|way|loop|bnd|bend|lndg|landing|path|pkwy|parkway|pass|rdg|ridge|vw)).*?((?:UNT|\#)[^\S\r\n]?\w|\w.*)?$|(\d+).*?((?:[a-z](?:[a-z]|[^\S\r\n])+))$
Simply adding "123 cambridge bay dr" works fine
You can see this working here
Testing