city population difference - mapreduce

I have an input file
Chicago 500
NewWork 200
California 100
I need difference of second column as output for each city with each other
Chicago Newyork 300
Chicago California 100
Newyork Chicago -300
Newyork California 100
California Chicago -400
California Newyork -100
I tried alot but not able to figure out exact and correct way to implement in map reduce . Please give me some solution

Here is a pseudocode. I use Python often, so it looks more like it. For this to work, you must know the total number of lines (i.e., cities here) and use that for N prior to running the job.
map(dummy, line):
city, pop = line.split()
for idx in 1:N
emit(idx, (city, pop))
reduce(idx, city_data):
city_data.sort() # sort by city to ensure indices are consistent
city, pop = city_data[idx]
for i in 1:N
if idx != i:
c, p = city_data[i]
dist = pop - p
emit(city, (c, dist))

Related

RegEx for matching Germany or Austria or CH Postcodes

It is about my site, it is a ad portal and 3 geodata are installed in the system: Germany, Switzerland and Austria.
When I look for an advertisement in Germany, everything works correctly, I'm looking for zip code 68259 and a radius of 30 km. The results are correct, it shows all ads from 68259 Mannheim and the radius of 30 km.
Problem: The problem exists when I search in Switzerland or Austria: I search for the postal code 6000 Lucerne 1 PF and a radius of 30 km ... the results are wrong, I also find ads from Munich or Frankfurt which correspond to 300-500 km radius! I think the mistake is somewhere in the regex postal verification! Any advice what could be wrong???
// Germany Postcode
preg_match('/\b((?:0[1-46-9]\d{3})|(?:[1-357-9]\d{4})|(?:[4][0-24-9]\d{3})|(?:[6][013-9]\d{3}))\b/is', $this->search_code, $output);
if(!empty($output[0])){
$this->search_code = $output[0];
}else{
// Switzerland, Austria Postcode
preg_match('/\d{4}/', $this->search_code, $at_ch);
if(!empty($at_ch[0])){
$this->search_code = $at_ch[0];
}
}
The following regex will match codes for DE, CH & AU:
'/\b((?:0[1-46-9]\d{3})|(?:[1-357-9]\d{4})|(?:[4][0-24-9]\d{3})|(?:[6][013-9]\d{3})|(?:\d{4}))\b/is'
Examples
68259 Mannheim -> 68259
6000 Lucerne 1 PF -> 6000
1234 Musterstadt -> 1234

Is there any query string that I can use with the QUERY function to get a group-wise maximum?

I know how to use the GROUP BY clause in the QUERY function with either a single or multiple fields. This can return the single row per grouping with the maximum value for one of the fields.
This page explains it nicely using these queries and image:
=query({A2:B10},"Select Col1,min(Col2) group by Col1",1)
=query({A14:C22},"Select Col1,Col2,min(Col3) group by Col1,Col2",1)
However, what if I only want a query that returns the corresponding values for the most recent row, grouped by multiple fields? Is there a query that can do this?
Example
Source Table
created_at
first_name
last_name
email
address
city
st
zip
amount
4/12/2022 19:15:00
Ava
Anderson
ava#domain.com
123 Main St
Anytown
IL
12345
1.00
8/30/2022 21:38:00
Brooklyn
Brown
bb#domain.com
234 Lake Rd
Baytown
CA
54321
2.00
2/12/2022 16:58:00
Ava
Anderson
ava#new.com
123 Main St
Anytown
IL
12345
3.00
4/28/2022 01:41:00
Brooklyn
Brown
brook#acme.com
456 Ace Ave
Bigtown
NY
23456
4.00
5/03/2022 17:10:00
Brooklyn
Brown
bb#domain.com
234 Lake Rd
Baytown
CA
54321
5.00
Desired Query Result
Group by first_name, last_name, address, city, st, and zip, but return the created_at, email, and amount for the maximum (most recent) value of created_at:
created_at
first_name
last_name
email
address
city
st
zip
amount
4/12/2022 19:15:00
Ava
Anderson
ava#domain.com
123 Main St
Anytown
IL
12345
1.00
8/30/2022 21:38:00
Brooklyn
Brown
bb#domain.com
234 Lake Rd
Baytown
CA
54321
2.00
4/28/2022 01:41:00
Brooklyn
Brown
brook#acme.com
456 Ace Ave
Bigtown
NY
23456
4.00
Is such a query possible in Google Sheets?
Use this formula
=QUERY({QUERY(A1:I, " Select max(A),min(B),min(C),min(D),min(E),min(F),min(G),min(H),min(I) Group by B,C,E,F,G,H ", 1)},
" Select * Where Col1 is Not null ")
I believe that this is the formula you need:
=ARRAY_CONSTRAIN(SORTN(SORT(
QUERY({A1:I9,INDEX(IFERROR(REGEXEXTRACT(D1:D9,"(\D+)#")))},
"where Col2 is not null"),
10,1,1,0),9^9,2,10,1),9^9,9)
(Do adjust the formula according to your ranges and locale)
For the formula to work we create the helper column
INDEX(IFERROR(REGEXEXTRACT(D1:D9,"(\D+)#"))).
We also use 9^9 which equals to 387420489 rows, making sure that all rows are included in our sorting calculations.
Finally in our ARRAY_CONSTRAIN function we return the first 9 columns discarding the 10th helper column.
Functions used:
REGEXEXTRACT
IFERROR
INDEX
QUERY
SORT
SORTN
ARRAY_CONSTRAIN

How do I create a pivot table with weighted averages from a table in PowerBI?

I have data in the following format:
Building
Tenant
Type
Floor
Sq Ft
Rent
Term Length
1 Example Way
Jeff
Renewal
5
100
100
6
47 Fake Street
Tom
New
3
500
200
12
I need to create a visualisation in PowerBI that displays a pivot table of attribute by tenant, with a weighted averages (by square foot) column, like this:
Jeff
Tom
Weighted Average (by Sq Ft)
Building
1 Example Way
47 Fake Street
-
Type
Renewal
New
-
Floor
5
3
-
Sq Ft
100
500
433.3333333
Rent
100
200
183.3333333
Term Length (months)
6
12
11
I have unpivoted the original data, like this:
Tenant
Attribute
Value
Jeff
Building
1 Example Way
Jeff
Type
Renewal
Jeff
Floor
5
Jeff
Sq Ft
100
Jeff
Rent
100
Jeff
Term Length (months)
6
Tom
Building
47 Fake Street
Tom
Type
New
Tom
Floor
3
Tom
Sq Ft
500
Tom
Rent
200
Tom
Term Length (months)
12
I can almost create what I need from the unpivoted data using a matrix (as below), but I can't calculate the weighted averages column from that matrix.
Jeff
Tom
Building
1 Example Way
47 Fake Street
Type
Renewal
New
Floor
5
3
Sq Ft
100
500
Rent
100
200
Term Length (months)
6
12
I can also create a table with my attributes as headers (instead of in a column). This displays the right values and lets me calculate weighted averages (as below).
Building
Type
Floor
Sq Ft
Rent
Term Length (months)
Jeff
1 Example Way
Renewal
5
100
100
6
Tom
47 Fake Street
New
3
500
200
12
Weighted Average (by Sq Ft)
-
-
-
433.3333333
183.3333333
11
However, it's important that these values are displayed vertically instead of horizontally. This is pretty straightforward in Excel, but I can't figure out how to do it in PowerBI. I hope this is clear. Can anyone help?
Thanks!

Django order_by on two fields, but one needs to be reversed

I figured out how to do an order by with multiple fields thanks to this post:
Django: Order_by multiple fields
But I ran into the issue of, one field is a number, the other is a persons name. I need it to be ordered in descending from the highest dues to the lowest, but within that be sorted by name in alphabetical order.
I tried this:
invoice_items = InvoiceItem.objects.filter(invoice__exact=inv.id).order_by('dues', 'provider').reverse()
It does the dues right, goes from group of highest dues like 350 to lowest, but then the names are also reverse so top of the list are names starting with Z y etc...
What I need:
Bob 350
Carl 350
Mike 350
Thomas 350
April 200
Gary 200
etc..
instead what I get:
Thomas 350
Mike 350
Carl 350
Bob 350
Gary 200
April 200
Not sure the right syntax to achieve this.
I think this will work:
invoice_items = (InvoiceItem.objects
.filter(invoice__exact=inv.id)
.order_by('-dues', 'provider'))

Self Join in Pandas: Merge all rows with the equivalent multi-index

I have one dataframe in the following form:
df = pd.read_csv('data/original.csv', sep = ',', names=["Date", "Gran", "Country", "Region", "Commodity", "Type", "Price"], header=0)
I'm trying to do a self join on the index Date, Gran, Country, Region producing rows in the form of
Date, Gran, Country, Region, CommodityX, TypeX, Price X, Commodity Y, Type Y, Prixe Y, Commodity Z, Type Z, Price Z
Every row should have all the different commodities and prices of a specific region.
Is there a simple way of doing this?
Any help is much appreciated!
Note: I simplified the example by ignoring a few attributes
Input Example:
Date Country Region Commodity Price
1 03/01/2014 India Vishakhapatnam Rice 25
2 03/01/2014 India Vishakhapatnam Tomato 30
3 03/01/2014 India Vishakhapatnam Oil 50
4 03/01/2014 India Delhi Wheat 10
5 03/01/2014 India Delhi Jowar 60
6 03/01/2014 India Delhi Bajra 10
Output Example:
Date Country Region Commodit1 Price1 Commodity2 Price2 Commodity3 Price3
1 03/01/2014 India Vishakhapatnam Rice 25 Tomato 30 Oil 50
2 03/01/2014 India Delhi Wheat 10 Jowar 60 Bajra 10
What you want to do is called a reshape (specifically, from long to wide). See this answer for more information.
Unfortunately as far as I can tell pandas doesn't have a simple way to do that. I adapted the answer in the other thread to your problem:
df['idx'] = df.groupby(['Date','Country','Region']).cumcount()
df.pivot(index= ['Date','Country','Region'], columns='idx')[['Commodity','Price']]
Does that solve your problem?