RegEx for matching Germany or Austria or CH Postcodes - regex

It is about my site, it is a ad portal and 3 geodata are installed in the system: Germany, Switzerland and Austria.
When I look for an advertisement in Germany, everything works correctly, I'm looking for zip code 68259 and a radius of 30 km. The results are correct, it shows all ads from 68259 Mannheim and the radius of 30 km.
Problem: The problem exists when I search in Switzerland or Austria: I search for the postal code 6000 Lucerne 1 PF and a radius of 30 km ... the results are wrong, I also find ads from Munich or Frankfurt which correspond to 300-500 km radius! I think the mistake is somewhere in the regex postal verification! Any advice what could be wrong???
// Germany Postcode
preg_match('/\b((?:0[1-46-9]\d{3})|(?:[1-357-9]\d{4})|(?:[4][0-24-9]\d{3})|(?:[6][013-9]\d{3}))\b/is', $this->search_code, $output);
if(!empty($output[0])){
$this->search_code = $output[0];
}else{
// Switzerland, Austria Postcode
preg_match('/\d{4}/', $this->search_code, $at_ch);
if(!empty($at_ch[0])){
$this->search_code = $at_ch[0];
}
}

The following regex will match codes for DE, CH & AU:
'/\b((?:0[1-46-9]\d{3})|(?:[1-357-9]\d{4})|(?:[4][0-24-9]\d{3})|(?:[6][013-9]\d{3})|(?:\d{4}))\b/is'
Examples
68259 Mannheim -> 68259
6000 Lucerne 1 PF -> 6000
1234 Musterstadt -> 1234

Related

Is there any query string that I can use with the QUERY function to get a group-wise maximum?

I know how to use the GROUP BY clause in the QUERY function with either a single or multiple fields. This can return the single row per grouping with the maximum value for one of the fields.
This page explains it nicely using these queries and image:
=query({A2:B10},"Select Col1,min(Col2) group by Col1",1)
=query({A14:C22},"Select Col1,Col2,min(Col3) group by Col1,Col2",1)
However, what if I only want a query that returns the corresponding values for the most recent row, grouped by multiple fields? Is there a query that can do this?
Example
Source Table
created_at
first_name
last_name
email
address
city
st
zip
amount
4/12/2022 19:15:00
Ava
Anderson
ava#domain.com
123 Main St
Anytown
IL
12345
1.00
8/30/2022 21:38:00
Brooklyn
Brown
bb#domain.com
234 Lake Rd
Baytown
CA
54321
2.00
2/12/2022 16:58:00
Ava
Anderson
ava#new.com
123 Main St
Anytown
IL
12345
3.00
4/28/2022 01:41:00
Brooklyn
Brown
brook#acme.com
456 Ace Ave
Bigtown
NY
23456
4.00
5/03/2022 17:10:00
Brooklyn
Brown
bb#domain.com
234 Lake Rd
Baytown
CA
54321
5.00
Desired Query Result
Group by first_name, last_name, address, city, st, and zip, but return the created_at, email, and amount for the maximum (most recent) value of created_at:
created_at
first_name
last_name
email
address
city
st
zip
amount
4/12/2022 19:15:00
Ava
Anderson
ava#domain.com
123 Main St
Anytown
IL
12345
1.00
8/30/2022 21:38:00
Brooklyn
Brown
bb#domain.com
234 Lake Rd
Baytown
CA
54321
2.00
4/28/2022 01:41:00
Brooklyn
Brown
brook#acme.com
456 Ace Ave
Bigtown
NY
23456
4.00
Is such a query possible in Google Sheets?
Use this formula
=QUERY({QUERY(A1:I, " Select max(A),min(B),min(C),min(D),min(E),min(F),min(G),min(H),min(I) Group by B,C,E,F,G,H ", 1)},
" Select * Where Col1 is Not null ")
I believe that this is the formula you need:
=ARRAY_CONSTRAIN(SORTN(SORT(
QUERY({A1:I9,INDEX(IFERROR(REGEXEXTRACT(D1:D9,"(\D+)#")))},
"where Col2 is not null"),
10,1,1,0),9^9,2,10,1),9^9,9)
(Do adjust the formula according to your ranges and locale)
For the formula to work we create the helper column
INDEX(IFERROR(REGEXEXTRACT(D1:D9,"(\D+)#"))).
We also use 9^9 which equals to 387420489 rows, making sure that all rows are included in our sorting calculations.
Finally in our ARRAY_CONSTRAIN function we return the first 9 columns discarding the 10th helper column.
Functions used:
REGEXEXTRACT
IFERROR
INDEX
QUERY
SORT
SORTN
ARRAY_CONSTRAIN

How do I create a pivot table with weighted averages from a table in PowerBI?

I have data in the following format:
Building
Tenant
Type
Floor
Sq Ft
Rent
Term Length
1 Example Way
Jeff
Renewal
5
100
100
6
47 Fake Street
Tom
New
3
500
200
12
I need to create a visualisation in PowerBI that displays a pivot table of attribute by tenant, with a weighted averages (by square foot) column, like this:
Jeff
Tom
Weighted Average (by Sq Ft)
Building
1 Example Way
47 Fake Street
-
Type
Renewal
New
-
Floor
5
3
-
Sq Ft
100
500
433.3333333
Rent
100
200
183.3333333
Term Length (months)
6
12
11
I have unpivoted the original data, like this:
Tenant
Attribute
Value
Jeff
Building
1 Example Way
Jeff
Type
Renewal
Jeff
Floor
5
Jeff
Sq Ft
100
Jeff
Rent
100
Jeff
Term Length (months)
6
Tom
Building
47 Fake Street
Tom
Type
New
Tom
Floor
3
Tom
Sq Ft
500
Tom
Rent
200
Tom
Term Length (months)
12
I can almost create what I need from the unpivoted data using a matrix (as below), but I can't calculate the weighted averages column from that matrix.
Jeff
Tom
Building
1 Example Way
47 Fake Street
Type
Renewal
New
Floor
5
3
Sq Ft
100
500
Rent
100
200
Term Length (months)
6
12
I can also create a table with my attributes as headers (instead of in a column). This displays the right values and lets me calculate weighted averages (as below).
Building
Type
Floor
Sq Ft
Rent
Term Length (months)
Jeff
1 Example Way
Renewal
5
100
100
6
Tom
47 Fake Street
New
3
500
200
12
Weighted Average (by Sq Ft)
-
-
-
433.3333333
183.3333333
11
However, it's important that these values are displayed vertically instead of horizontally. This is pretty straightforward in Excel, but I can't figure out how to do it in PowerBI. I hope this is clear. Can anyone help?
Thanks!

city population difference

I have an input file
Chicago 500
NewWork 200
California 100
I need difference of second column as output for each city with each other
Chicago Newyork 300
Chicago California 100
Newyork Chicago -300
Newyork California 100
California Chicago -400
California Newyork -100
I tried alot but not able to figure out exact and correct way to implement in map reduce . Please give me some solution
Here is a pseudocode. I use Python often, so it looks more like it. For this to work, you must know the total number of lines (i.e., cities here) and use that for N prior to running the job.
map(dummy, line):
city, pop = line.split()
for idx in 1:N
emit(idx, (city, pop))
reduce(idx, city_data):
city_data.sort() # sort by city to ensure indices are consistent
city, pop = city_data[idx]
for i in 1:N
if idx != i:
c, p = city_data[i]
dist = pop - p
emit(city, (c, dist))

Self Join in Pandas: Merge all rows with the equivalent multi-index

I have one dataframe in the following form:
df = pd.read_csv('data/original.csv', sep = ',', names=["Date", "Gran", "Country", "Region", "Commodity", "Type", "Price"], header=0)
I'm trying to do a self join on the index Date, Gran, Country, Region producing rows in the form of
Date, Gran, Country, Region, CommodityX, TypeX, Price X, Commodity Y, Type Y, Prixe Y, Commodity Z, Type Z, Price Z
Every row should have all the different commodities and prices of a specific region.
Is there a simple way of doing this?
Any help is much appreciated!
Note: I simplified the example by ignoring a few attributes
Input Example:
Date Country Region Commodity Price
1 03/01/2014 India Vishakhapatnam Rice 25
2 03/01/2014 India Vishakhapatnam Tomato 30
3 03/01/2014 India Vishakhapatnam Oil 50
4 03/01/2014 India Delhi Wheat 10
5 03/01/2014 India Delhi Jowar 60
6 03/01/2014 India Delhi Bajra 10
Output Example:
Date Country Region Commodit1 Price1 Commodity2 Price2 Commodity3 Price3
1 03/01/2014 India Vishakhapatnam Rice 25 Tomato 30 Oil 50
2 03/01/2014 India Delhi Wheat 10 Jowar 60 Bajra 10
What you want to do is called a reshape (specifically, from long to wide). See this answer for more information.
Unfortunately as far as I can tell pandas doesn't have a simple way to do that. I adapted the answer in the other thread to your problem:
df['idx'] = df.groupby(['Date','Country','Region']).cumcount()
df.pivot(index= ['Date','Country','Region'], columns='idx')[['Commodity','Price']]
Does that solve your problem?

Space Formatting data to csv

For quite some time I have been trying to format space separated data to a CSV structure.
Initial position
The initial data table is given by:
Dr. Arun Raykar MBBS, MS - ENT 9 years experience Ear-Nose-Throat (ENT) Specialist SHAKTHI E.N.T CARE Malleswaram, Bangalore INR 250 MON-SAT7:00PM-9:00PM Book Appointment
Dr. Hema Sanath C BHMS, CFN 0 years experience Homeopath Sankirana Homeopathic Clinic Kalyan Nagar, Bangalore INR 250 MON-SAT10:00AM-2:00PM6:30PM-8:00PM Book Appointment
Dr. Hema Ahuja BDS,M Phil 33 years experience Dentist V2 E City Family Dental Center Electronics City, Bangalore INR 200 MON-SUN10:00AM-8:00PM Book Appointment
It contains lots of spaces and unnecessary information throughout. The information is present somewhat like this
Doctor's name | Degree | Years of experience | Specialization | Hospital name | Address | Fees | Schedule | and an unnecessary book appointment field.
I want to convert it to the following format
Doctor's name,Specialization,Hospital name,Address,Fees,Schedule
So the current data should look like this
Dr. Arun Raykar,Ear-Nose-Throat (ENT) Specialist,SHAKTHI E.N.T CARE,Malleswaram,INR 250,MON-SAT7:00PM-9:00PM
Dr. Hema Sanath,Homeopath,Sankirana Homeopathic Clinic,Kalyan Nagar,INR 250,MON-SAT10:00AM-2:00PM6:30PM-8:00PM
Dr. Hema Ahuja,Dentist,V2 E City Family Dental Center,Electronics City,INR 200,MON-SUN10:00AM-8:00PM
Till now I have succeeded in removing the Book Appointment field.
Problem
However I am facing difficulties in classifying the hospital's name. As the spacing in it varies a lot. Is this problem feasible?
EDIT
The output of cat -A file is the following:
Dr. Arun Raykar MBBS, MS - ENT 9 years experience Ear-Nose-Throat (ENT) Specialist SHAKTHI E.N.T CARE ^I Malleswaram, Bangalore INR 250 MON-SAT7:00PM-9:00PM Book Appointment $
Dr. Hema Sanath C BHMS, CFN 0 years experience Homeopath Sankirana Homeopathic Clinic ^I Kalyan Nagar, Bangalore INR 250 MON-SAT10:00AM-2:00PM6:30PM-8:00PM Book Appointment $
Dr. Hema Ahuja BDS,M Phil 33 years experience Dentist V2 E City Family Dental Center ^I Electronics City, Bangalore INR 200 MON-SUN10:00AM-8:00PM Book Appointment
There's no straightforward way to separate the specialization from the hospital name, but with some assumptions, you could perhaps use perl to do this:
perl -pe 's/^(\S+\s+\S+\s+\S+).+experience\s([^\t]+?)\s+(\b[A-Z0-9]{2}[^\t]+?|(?:(?!\b[A-Z0-9]{2})[^\t])*)\s+\t\s+([^,]+,).+?(INR.+?PM)\s+.*/\1,\2,\3,\4\5/' file
Gives:
Dr. Arun Raykar,Ear-Nose-Throat (ENT) Specialist,SHAKTHI E.N.T CARE,Malleswaram,INR 250 MON-SAT7:00PM-9:00PM
Dr. Hema Sanath,Homeopath,Sankirana Homeopathic Clinic,Kalyan Nagar,INR 250 MON-SAT10:00AM-2:00PM6:30PM-8:00PM
Dr. Hema Ahuja,Dentist,V2 E City Family Dental Center,Electronics City,INR 200 MON-SUN10:00AM-8:00PM
And since it's perl based regex, you can use regex101 to get a glimpse of how it works through the regex debugger. The regex is quite straightforward, but the fact that there are many parts can make it appear daunting.
Warning: The above is able to separate the specialization based on two things:
It tries to find the first occurrence of space followed by two uppercase characters or digits and starts matching as the hospital name when it finds it; or
If there are no consecutive uppercase characters or digits, it takes only the first word as the specialization and the rest as the hospital name.
I know it might not solve the complete problems as there are always lines that won't fit the above rules, but that can get you started on cleaning these up. If there is anything incorrectly separated (i.e. when the specialization consists of more than 1 word and the hospital name doesn't have two consecutive upper/digit) you will have one word of the specialization correctly placed, and the rest in the hospital name.
Unfortunately, based on your input, there's no way to separate specialisation with hospital name. The other fields can be captured, albeit inelegantly and with gawk (probably >= 4.0, but I think 3.x should work):
$ awk -F" \t " -v OFS="," -v S=" " '
{
sub(/\s+$/, "");
split($2, Data, /[ ,]{2,}/);
Address = Data[1];
split($2, Data, / +/);
nData = length(Data);
Schedule = Data[nData - 2];
Fees = Data[nData - 4] S Data[nData - 3];
split($1, Data, / +/);
Name = Data[1] S Data[2] S Data[3]; # assume all names are Dr. Xxx Xxx only
match($1, /[0-9]+ years experience /);
SpecializationHospital = substr($1, RSTART + RLENGTH);
print Name, SpecializationHospital, Address, Fees, Schedule;
} ' data.txt
Dr. Arun Raykar,Ear-Nose-Throat (ENT) Specialist SHAKTHI E.N.T CARE,Malleswaram,INR 250,MON-SAT7:00PM-9:00PM
Dr. Hema Sanath,Homeopath Sankirana Homeopathic Clinic,Kalyan Nagar,INR 250,MON-SAT10:00AM-2:00PM6:30PM-8:00PM
Dr. Hema Ahuja,Dentist V2 E City Family Dental Center,Electronics City,INR 200,MON-SUN10:00AM-8:00PM