I want to use on my website digits of postal codes for Belgium/Netherlands for checking out - opencart

I'm searching a database of digits codes for Belgium & Netherlands when a person checks out hes/her shoppingcart.
I do did find a map of it but ofcourse it has to be put in a code. (map is 2 digit coded)
For example.
3630 Maasmechelen = postal code 2 digit (36)
Now the 2 digit is 36 but everything with 36 should be possilbe. For example. 3600-3601-3602 (hamlets) and the names of them has to be selected ofcourse.
Another example: 3600 Genk (client should be able to select this in total)
But it takes ages to do it manually so I would like to know if there is somewhere a database of all digits.

Related

Pandas str match for German addresses

I have a quite annoying problem in designing a regex to prepare addresses for geocoding with Nominatim. I am working with German addresses which look like this:
Von-der-Leyen-Platz 1 47506 Neukirchen-Vluyn
Schildstraße 52531 Übach-Palenberg
Finkenratherstraße Straße 4a 52134 Herzogenrath
Format: Street Number Postal code City
What I want to achieve is that first literals after street numbers do not occur. For this I am using the following regex:
(\d+).*?\s+(.+)
It is matching the third address to 4 52134 Herzogenrath. But not to Finkenratherstraße 4 52143 Herzogenrath. Another problem I saw is the second address as it does not have a street number. That is why I wanted to filter create a regex which can filter for the following structure:
Street name {number if available} Postal code (5 digits) City name
The postal code always has 5 digits and the structure is always the same just that sometimes the street number is missing.
Is there any way to design this as a regex?
For your data, this could work:
# sample data
s = pd.Series(['Von-der-Leyen-Platz 1 47506 Neukirchen-Vluyn',
'Schildstraße 52531 Übach-Palenberg',
'Finkenratherstraße Straße 4a 52134 Herzogenrath'])
# extract
s.str.extract(r'(?P<Street>\D+)\s?(?P<Number>\d+\S*)?\s(?P<Postal>\d{5})\s(?P<City>\D+)$')
Output:
Street Number Postal City
0 Von-der-Leyen-Platz 1 47506 Neukirchen-Vluyn
1 Schildstraße NaN 52531 Übach-Palenberg
2 Finkenratherstraße Straße 4a 52134 Herzogenrath

identifying phone numbers (different format but same number)

On my website, people can register with their telephone number. Only one account per phone number is allowed. The problem is: a phone number can be written in several ways.
I'm from Belgium. Our (mobile) phone numbers are like this: 04xx xx xx xx (for example 04 99 12 34 56). Belgium's country code is 0032, so 0032499 12 34 56 is also a valid phone number, just like +32499 12 34 56 is.
So no I have three phone numbers which are exactly the same, but are written different and the system does not recognize them as the same.
Possible solution (will not work)
Every Belgian phone number has the same ending:
00324xxxxxxxx
+324xxxxxxxx
04xxxxxxxx
I could check the last 9 digits (starting with 4), but the problem is: not only people from Belgium can register, also other countries can register. A US mobile phone number does not end with 4xxxxxxxx, so I could not do this check on these numbers.
Possible solution
Adding a dropdown with all the country codes, and normalizing every phone number before it is submitted.
For Belgian phone numbers: +32 (dropdown) 0499 12 34 56 (input) would become +32 (dropdown) 499 12 34 56 (strip the 0).
Possible solution 2
I could use an API like this one (https://www.searchbug.com/api/identify-phone-number.aspx) but it's not free. Is there a free solution I can host myself?
Supposing your website is implemented in PHP, you can normalize your phone numbers before inserting these in your database in the following way:
<?php
$input_number="00324xxxxxxxx";
// or v.gr. $input_number="04xxxxxxxx";
$number=$input_number;
$number=preg_replace("/^00/", "+", $number);
$number=preg_replace("/^0/", "+32", $number); // see remark below
echo "input: ", $input_number, ", normalized form: ", $number, "\n";
If the numbers in the format 04xxxxxxxx are not supposed to be only Belgian numbers, add a drop down feature to force your users to provide a country prefix.

Regex in Hive QL (RLIKE) - performance?

I'm wondering how/if can I improve the regex I'm using in a query. I have a set of identifiers for certain user groups. They can be in two main format:
X123 or XY12, (type 1)
any two letter combo, excluding XY (type 2)
Type 1 groups always are of length 4. It's either letter X followed by a number between 100 and 999 (inclusive) OR XY followed by numbers between 0 and 99 (padded to length 2 with zeros).
Type 2 groups are 2 letter strings, with any letter allowed, excluding XY (although my query doesn't specify this).
User can belong to multiple groups, in which case different groups are separated by pound symbol (#). Here's an example:
groups user age
X124 john 23
XY22#AB mike 33
AB peter 21
X122#XY01 francis 43
I want to count rows in which at least one group in second format appears, i.e. where user is not exclusively member of groups in first format.
I need to catch all rows (i.e. users) which don't belong exclusively to first type of groups. In the example above, I want to exclude users john and francis because they are members only of type 1 groups.
On the other hand, mike is OK because he's member of AB group (i.e. group of type 2).
I'm currently doing it like this:
select
count(*)
from
users
where
groups not rlike '^(X[Y1-9][0-9]{2,2})(#X[Y1-9][0-9]{2,2})*$'
Is this bad performance wise? And how should I approach fixing it?
I want to count rows in which at least one group in second format appears.
It seems a bit simpler then to select where groups like:
\b(?:(?!XY)[A-Z]{2})\b
\b is a word boundary. It doesn't consume a character, instead it states there cannot be a non-alphanumeric character there.
Live demo.

Regular Expression for group

I have a text were I need to find 3 groups strings.
I try expression: \r?\n\r?\n\r?[0-9A-Z].*\d{7} but I find only 2 strings instead 3.
I should highlight 00170784,HEDINV,00173575 but I get only 00170784 and 00173575
This is the text:
BUY
USM4
200 contracts
04/28/2014 15:50
00170784
56
contracts
HEDINV
64
contracts
00173575
80
contracts
At average price of USD 134.375
SELL
USM4
200 contracts
04/28/2014 15:50
00170784
56
contracts
HEDINV
64
contracts
00173575
80
contracts
At average price of USD 134.5938
May I suggest using this instead?
^\d{8}$|^[A-Z]{6}$
It has two capture groups it looks for. One is an 8 digit sequence for a whole line. The other is a 6 letter sequence for a whole line. That grabs what you're looking, unless there's a specific reason you're using all those linebreak matches.

List all the numbers in various ranges (1 - 6; 1, 2, 5 & 23; etc)

I am working on a project where I have a column of Ward Group data, and the data is formatted as Municipality Name - Type (City, Town, Village) Wards. So for example:
ADAMS - T 1 & 2
Cumberland V 1 - 5
Marshfield - C 1 - 20, 23 - 25 & 27
....
etc.
To link this information to a government-provided ward shapefile, I need to have one line per Ward. So for example, I need to turn the above information into:
ADAMS - T 1
ADAMS - T 2
Cumberland V 1
Cumberland V 2
Cumberland V 3
Cumberland V 4
Cumberland V 5
Marshfield - C 1
....
Marshfield - C 20
Marshfield - C 23
Marshfield - C 24
Marshfield - C 25
Marshfield - C 27
Also, each Ward Group line has several columns of election data, that I want copied into each new row. So for example:
Ward Group Total Votes
ADAMS - T 1 & 2 300
needs to become:
Ward Group Total Votes
ADAMS - T 1 300
ADAMS - T 2 300
Is there a way to do this in Excel that isn't by hand, either formula or VBA?
Although not tagged as such, I suspect you are going to require code for this, so means your question "must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results." or could be deem off topic. So perhaps a start with formulae (assuming each 'row' is a single cell):
maybe regex to make the Municipality Name - Type linking more consistent (say look for letter space letter space and replace with letter space hyphen space letter space - so Cumberland V 1 - 5 ends up in a similar format to the others, ie Cumberland - V 1 - 5.)
then use =SUBSTITUTE(A1,"-",",",1) to replace the hyphens before the Type with commas (can reply on hyphens because used for range indication also).
Then "fix" the results (Paste Special Values to replace the formulae).
Then parse out the results (Text to Columns) with "," and "&" as the delimiters (only).
Insert three new columns immediately to the right of the left of these columns and parse the (existing) left column with Space and hyphen as the delimiters.
Applied to your example, at this point the results should look so:
Columns C & D are the lower and upper bounds of ranges - but I'm afraid E3 alone represents a range (though this could be split out with Text To Columns again - a point to watch out for may be confusing 27-29 with 27 & 29.
There is more could be done with formulae but I think VBA is a better bet - unless perhaps this is definitely a 'once off' requirement. For example, the votes, if in a different cell, could be attached after the pages (?) have been split out, via say a lookup function.