Ascending order sorting in Dataframe-Pandas after GroupBy

Ascending order sorting in Dataframe-Pandas after GroupBy - python-2.7

from pandas import Series, DataFrame
import pandas as pd
df1=pd.read_csv('/Users/nirmal/Desktop/Python/Assignments/Data/employee_compensation.csv', sep=',', skiprows=(1,1))
dfq2=DataFrame(df1.groupby(["Organization Group", "Department"])['Total Compensation'].mean())
dfq2
I need to sort the Total Compensation column descending order wise. and based on it Department should change within each Organization group. Organization group column should not get changed.

You can use sort_values with sort_index:
print (df.sort_values('Total Compensation', ascending=False)
.sort_index(level=0, sort_remaining=False))
Total Compensation
Organization Group Department
Community Health Academy of Sciences 107319.727692
Public Health 96190.190140
Arts Commission 94339.597388
Asian Art Museum 71401.520060
Culture & Recreation Law Library 188424.362222
City Attorney 166082.677561
Controller 104515.234944
Assessor/Recorder 89994.260614
City Planning 89022.876966
Board of Supervisors 78801.347641
War Memorial 76250.068022
Public Library 70446.352147
Civil Service Commission 67966.756559
Fine Arts Museum 44205.439895
Recreation and Park Commission 38912.859465
Elections 20493.166618
General Administration & Finance Ethics Commission 98631.380366
Another solution with reset_index, sort_values and set_index:
print (df.reset_index()
.sort_values(['Organization Group','Total Compensation'], ascending=[True, False])
.set_index(['Organization Group','Department']))
Total Compensation
Organization Group Department
Community Health Academy of Sciences 107319.727692
Public Health 96190.190140
Arts Commission 94339.597388
Asian Art Museum 71401.520060
Culture & Recreation Law Library 188424.362222
City Attorney 166082.677561
Controller 104515.234944
Assessor/Recorder 89994.260614
City Planning 89022.876966
Board of Supervisors 78801.347641
War Memorial 76250.068022
Public Library 70446.352147
Civil Service Commission 67966.756559
Fine Arts Museum 44205.439895
Recreation and Park Commission 38912.859465
Elections 20493.166618
General Administration & Finance Ethics Commission 98631.380366

Related

Best imputation for principal component analysis

I have a list of assets. I use these assets to create an asset index using pca:
local assets newspaper magazine clock water borehole table bed study bicycle cart car motorcycle tractor electricity fridge airconditioner fan washing_machine vacuum
alpha $assets
pca $assets
predict asset_idx
label var asset_idx "pupil asset ownership index: created using pca"
egen asset_idx_std = std(asset_idx)
However, I am missing about 3% of assets for each variable. This amounts to 30% after pairwise deletion. As such, I wish to impute the missing variables so every student with less than 10% of observations missing is not deleted during PCA. I do not otherwise wish to mi set, but can't work out another way:
mi set wide
mi register imputed newspaper magazine clock water borehole table bed study bicycle cart car motorcycle tractor electricity fridge airconditioner fan washing_machine vacuum computer internet radio tv vcr dvd cd cassette camera digi_camera vid_cam landline phone
set seed 1234
mi impute logit newspaper magazine clock water borehole table bed study bicycle cart car motorcycle tractor electricity fridge airconditioner fan washing_machine vacuum computer internet radio tv vcr dvd cd cassette camera digi_camera vid_cam landline phone, force add(1)
Unfortunately, this is only successfully imputing a small fraction of missing observations:
------------------------------------------------------------------
| Observations per m
|----------------------------------------------
Variable | Complete Incomplete Imputed | Total
-------------------+-----------------------------------+----------
newspaper | 7007 110 4 | 7117
------------------------------------------------------------------
Any suggestions on appropriate imputation are much appreciated.

Power BI and DAX: How can I show which products a customer has bought [easy], also which products they have NOT bought

I have a nice big excel sheet of sales lines - customer, product, date and a bunch of other fields.
I can add a visual which shows how many sales by customer and by product, and I used a CALCULATE with CONCATENATEX and SUMMARIZE to define a measure which gives a comma separate string showing the full list of product s which were bought, by customer, for my selected date range, eg:
Customer
All products Bought
Amazing
sweets, crisps, donuts
BBD corp
crisps, donuts
Craazy kidz
donuts
Dunking bits
sweets, donuts
I'd like another col added which displayes all the products NOT bought, ie marketing opportunities:
Customer
All products Bought
Opportunities
Amazing
sweets, crisps, donuts
BBD corp
crisps, donuts
sweets
Craazy kidz
donuts
sweets, crisps
Dunking bits
sweets, donuts
crisps
I'd like the table to be flexible so eg if I added a Sweet vs Savoury slicer then it would show just those products which mapped to each of Sweet or Savoury.
Very grateful for any tips
EDIT TO INCLUDE SCREENSHOTS:

How to only match once?

My regex pattern looks something like
•$389 PER MONTH FOR 36 months $4,314 DUE AT SIGNING SUGGESTED DEALER
CONTRIBUTION OF $1,385 Offer not valid in Puerto Rico. Lease financing
available on new 2017 BMW 330i xDrive Sports Wagon models from
participating BMW Centers through BMW Financial Services through
January 02, 2018, to eligible, qualified customers with excellent
credit history who meet BMW Financial Services' credit requirements.
Monthly lease payments of $389 per month for 36 months is based on an
adjusted capitalized cost of $38,210 (MSRP of $45,595, including
destination and handling fee of $995, less $3,000 customer down, $0
security deposit and suggested dealer contribution of $1,385 and
$3,000 Holiday Lease Credit). Actual MSRP may vary. Dealer
contribution may vary and could affect your actual monthly lease
payment. Cash due at signing includes $3,000 down payment, $389 first
month's payment, $925 acquisition fee and $0 security deposit. Lessee
responsible for insurance during the lease term, excess wear and tear
as defined in the lease contract, $0.25/mile over 30,000 miles and a
disposition fee of $350 at lease end. Not all customers will qualify
for security deposit waiver. Tax, title, license and registration fees
are additional fees due at signing. Advertised payment does not
include applicable taxes. Purchase option at lease end, excluding tax,
title and government fees, is $27,813. Offer valid through January 02,
2018 and may be combined with other offers unless otherwise stated.
Models pictured may be shown with metallic paint and/or additional
accessories. Visit your authorized BMW Center for important details.
©2017 BMW of North America, LLC. The BMW name, model names and logo
are registered trademarks.
I want to match the number before months, which in this case is
36
However its coming out twice. is there a way to make it only match once?
I'm using \d+\s*(?=( months+?))
Thank you for your help!

Maybe you could find the first 36 before months from the start of the string and then capture it in a group:
^.*?(\d+)(?=\smonths)
Match the start of the string ^
Match any character 0 or more times non greedy .*?
Capture 1 or more digits in a group (\d+)
Positive lookahead if what follows is a whitespace and months (?=\smonths)

If/Then formula to treat a number as negative conditional upon accompanying text

I am creating a spreadsheet to track rent owed from agents at a real estate company. I have drop-down lists created to pick from the agent and the type of transaction (Payment, Rent Incurred, Credit Issued, Fee Issued, Payment Reversed, Copier Usage). What I want is to just enter the number in the Amount field without making it a negative, and have a formula that will add it or subtract it based on the type of transaction it is. I want this:
If C2="Payment" OR "Credit Issued", then D2=POSITIVE
If C2="Rent Incurred" OR "Fee Issued" OR "Payment Reversed" OR "Copier Usage", then D2=NEGATIVE
Is it possible to do a formula like that in conjunction with a running balance?
Like this:
Date Agent Transaction Amount Balance
8/25/15 Mike Payment $150.00 -$150.00
9/1/15 Joyce Rent Incurred $200.00 $ 50.00
9/1/15 Mike Rent Incurred $150.00 $200.00
9/1/15 Chris Rent Incurred $250.00 $450.00
9/6/15 Chris Payment $250.00 $200.00
9/15/15 Joyce Fee Issued $ 25.00 $225.00
9/21/15 Joyce Payment $225.00 $ 0.00
I guess I want my balance formula to be like this:
=if(c2,"Payment""credit issued",d2*-1,c2,"Rent incurred""fee issued""copier usage""payment reversed",d2)+c1
Does this make any sense at all?

A formula that may be entered in E2, and filled down:
=N(E1)+D2*IF(OR(C2="Payment",C2="Credit Issued"),-1,1)

How to parse a long dataframe of text using regular expressions into a dataframe [R]

I have a giant data frame which I would like to turn into a more usable format. It is based on a copy-pasted text file of a schedule, where the entries have a consistent format.So, I know that each day will have the format:
###
Title - Date
First event
Time: 11:00 AM
Location: Address
Address line 2
Second event description
Time: 12:00 AM
Location: Address
Address
###
What I am having trouble with is figuring out how to parse this. Basically, I want to store everything between the "###"'s as a single day, and then add events based on how many times the above format repeats, and make a string or datetime entry based on if letters are following a "Time:" or a "Location:".
However, I am really having trouble with this. I have tried putting it all into a giant dataframe where each line is a row, and then adding dummies for location rows, time rows, etc as seperate columns, but am not sure how to translate that into discrete days or events. Any help would be appreciated.
Data is public, so a sample is below -- it is a giant dataframe with one row for each row of text:
*Text*
###
The Public Schedule for Mayor Rahm Emanuel – January 5, 2014
There are no public events scheduled at this time.
###
The Public Schedule for Mayor Rahm Emanuel – January 6, 2014
Mayor Rahm Emanuel will greet and thank snow clearing teams from the Department of Streets and Sanitation.
Time: 11:30 AM
Location: Department of Streets and Sanitation
Salt Station
West Grand Avenue and North Rockwell Street
Chicago, IL*
*There will be no media availability.
Mayor Rahm Emanuel and City Officials will provide an update on the City’s efforts to provide services during the extreme weather.
Time: 2:00 PM
Location: Office of Emergency Management and Communications
Headquarters
1411 West Madison Street
Chicago, IL*
*There will be a media availability following.
Mayor Rahm Emanuel will greet and visit with residents taking advantage of a City warming center.
Time: 3:00 PM
Location: Department of Family and Support Services
10 South Kedzie Avenue
Chicago, IL**
*There will be no media availability.
**Members of the media must respect the privacy of residents at the facility, and can only film City of Chicago employees.
###
Edit:
An example output I would like is something like (sorry the code here is broken, not sure why!):
Date Time Description Location
December 4th 9:00 AM A housewarming party 1211 Main St.
December 5th 11:00 AM Another big event 1234 Main St.
If at all possible.
EDIT 2:
Basically -- I know how to pull all this stuff out of the columns. I think my issue may really be reshaping the data intelligently. I have split it into this giant dataframe with one column where each row is a string which corresponds to a row of text int he original schedule. I then have a bunch of columns like "is_time", "is_new_entry", "is_location" which are 1 if the row is a time, new entry beginning, or location. I just don't know how to reshape this into the output above.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js