Pandas Multi-indexing from a Flatten DataFrame - python-2.7

I want to resample flatten dataframe to multi-indexed columns.
Dataframe looks like :
goods category month stock
a c1 1 5
a c1 2 0
a c1 3 0
a c2 1 0
a c2 2 10
a c2 3 0
b c1 1 30
b c1 2 0
b c1 3 10
b c2 1 0
b c2 2 40
b c2 3 0
And I would like to set him like this :
stock
goods a b
category c1 c2 c1 c2
month
1 5 0 30 0
2 5 10 30 40
3 5 10 10 40
I try somethings with groupby or stack but I don't find a good way...Does anyone know how to do this ?

With unstack (to use this you first have to set the multi-index):
In [48]: df.set_index(['goods', 'category', 'month']).unstack([0,1])
Out[48]:
stock
goods a b
category c1 c2 c1 c2
month
1 5 0 30 0
2 0 10 0 40
3 0 0 10 0
Alternative with pivot_table (but be aware, if you have multiple values with the same combination of goods/category/month, they will be averaged by default (another function can be specified)):
In [54]: df.pivot_table(columns=['goods', 'category'], index='month', values='stock')
Out[54]:
goods a b
category c1 c2 c1 c2
month
1 5 0 30 0
2 0 10 0 40
3 0 0 10 0

Related

Subtracting values based on a index column and using a condition in the same column in DAX

I've a lot of material on Stack about this, but i'm still not able to reproduce it.
Sample data set.
Asset
Value
Index
A
10
1
B
15
1
C
20
1
A
11
2
B
17
2
C
24
2
A
18
3
B
25
3
C
30
3
What i want to do is, subtract the Asset values individually based on the index column.
Ex:
Asset A [1] -> 10
Asset A [2] -> 11
11 - 10 = 1
So the table would be like this.
Asset
Value
Index
Diff
A
10
1
0
B
15
1
0
C
20
1
0
A
11
2
1
B
17
2
2
C
24
2
4
A
18
3
7
B
25
3
8
C
30
3
6
This need's to be done using DAX.
Can you guys help me ?
Best Regards!
I just did this and it worked.
Diff =
var Assets = 'Table'[Asset]
var Ind = 'Table'[Index] - 1
Return
IF(Ind = -1, 0, 'Table'[Value] - CALCULATE(SUM('Table'[Value]),FILTER('Table','Table'[Asset] = Assets && 'Table'[Index] = Ind)))

Expand table by merging additional variables as columns

I have a dataset that looks like this but with several more binary outcome variables: 
* Example generated by -dataex-. To install: ssc install dataex
clear
input long ID byte(Region Type Tier Secure Offshore Highland)
120034 12 1 2 1 0 1
120035 12 1 2 1 0 1
120036 12 1 2 1 0 1
120037 12 1 2 1 0 1
120038 41 1 2 1 0 0
120039 41 2 2 1 1 0
120040 41 2 1 0 1 0
120041 41 2 1 0 1 0
120042 41 2 1 0 1 0
120043 41 2 1 0 0 .
120044 65 2 1 0 0 .
120045 65 3 1 0 0 0
120046 65 3 1 1 0 0
120047 65 3 2 1 1 0
120048 65 3 2 1 0 0
120049 65 3 2 . 1 1
120050 25 3 2 . 1 1
120051 25 5 2 . 1 1
120052 25 5 1 . 0 1
120053 25 5 2 . 0 .
120054 25 5 2 0 0 .
120055 25 5 1 0 . 0
120056 25 5 1 0 . 0
120057 95 7 1 0 1 0
120058 95 7 1 0 1 0
120059 95 7 1 1 1 0
120060 95 7 2 1 0 1
120061 95 7 2 1 0 1
120062 59 7 2 1 0 1
120063 95 8 2 0 . 1
120064 59 8 1 0 . 1
120065 59 8 1 0 . 0
120066 59 8 1 1 . 0
120067 59 8 1 1 1 0
120068 59 8 2 1 1 0
120069 40 9 2 1 1 1
120070 40 9 2 1 0 1
120071 40 9 2 1 0 1
120072 40 9 1 0 0 1
end
I am creating a table with the community-contributed command tabout:
foreach v of var Secure Offshore Highland{
    tabout Region Type Tier `v' using `v'.docx ///
    , replace ///
    style(docx) font(bold) c(freq row) ///
    f(1) layout(cb) landscape nlab(Obs) paper(A4)
    }
It has both row frequencies, percentages and the totals.
However, I did not need all this information so i modified my code as follows:
foreach v of var Secure Offshore Highland{
    tabout Region Type Tier `v' using `v'.docx ///
    , replace ///
    style(docx) font(bold) c(freq row) ///
    f(1) layout(cb) h3(nil) h2(nil) dropc(2 3 4 5 7) landscape nlab(Obs) paper(A4)
    }
This produces what I need but both versions of my code create three individual tables for each outcome variables. I have to manually make one table combining the three tables keeping the left-most column, the % of "1" column and the right-most column showing the row-total. 
Can anyone help me out here regarding:
Merging all the tables in one go, keeping the exploratory variable labels on the left-most and the rowtotal on the right-most column.
Instead of deleting the columns except % of "1"s, I only want to have the desired column. Deleting columns seem so crude and dangerous.
Can i get this same output in Excel through "putexcel"? I tried following the wonderfully written blog by Chuck Huber. But I cannot figure out the "merging" part.
I came this far due to lots and lots of studying, especially Ian Watson's "User Guide for tabout Version 3" and Nicholas Cox's "How to face lists with fortitude". 
Cross-posted on Statalist.
You cannot do this readily with tabout -- custom tables require custom programming.
My advice is to create a matrix with whatever values you need and then use the (also) community-contributed command esttab to tabulate and export everything.
That said, what you want requires a lot of work but here is a simplified example based on your data:
matrix N = J(1, 2, .)
local i 0
foreach v in Region Type Tier {
local i = `i' + 1
tabulate `v' Secure, matcell(A`i')
matrix arowsum = J(1, rowsof(A`i'), 1) * A`i'
matrix A`i' = A`i' \ arowsum
if `i' > 1 local N \ N
matrix m1a = (nullmat(m1a) `N' \ A`i')
}
local i 0
foreach v in Region Type Tier {
local i = `i' + 1
tabulate `v' Offshore, matcell(B`i')
matrix browsum = J(1, rowsof(B`i'), 1) * B`i'
matrix B`i' = B`i' \ browsum
if `i' > 1 local N \ N
matrix m2a = (nullmat(m2a) `N' \ B`i')
}
local i 0
foreach v in Region Type Tier {
local i = `i' + 1
tabulate `v' Highland, matcell(C`i')
matrix crowsum = J(1, rowsof(C`i'), 1) * C`i'
matrix C`i' = C`i' \ crowsum
if `i' > 1 local N \ N
matrix m3a = (nullmat(m3a) `N' \ C`i')
}
matrix m1b = m1a * J(colsof(m1a), 1, 1)
matrix m2b = m2a * J(colsof(m2a), 1, 1)
matrix m3b = m3a * J(colsof(m3a), 1, 1)
matrix M1 = m1a, m1b
matrix M2 = m2a, m2b
matrix M3 = m3a, m3b
matrix K = J(1, 3, .)
matrix M = M1 \ K \ M2 \ K \ M3
You can then use esttab to export the results in Excel or Word:
esttab matrix(M)
---------------------------------------------------
M
c1 c2 c1
---------------------------------------------------
r1 0 4 4
r2 3 0 3
r3 1 3 4
r4 4 2 6
r5 2 4 6
r6 2 3 5
r7 3 3 6
r1 15 19 34
r1 . . .
r1 0 5 5
r2 5 1 6
r3 1 3 4
r4 3 0 3
r5 2 4 6
r6 3 3 6
r7 1 3 4
r1 15 19 34
r1 . . .
r1 13 4 17
r2 2 15 17
r1 15 19 34
r1 . . .
r1 . . .
r1 4 0 4
r2 3 2 5
r3 3 1 4
r4 2 4 6
r5 1 2 3
r6 4 2 6
r7 2 3 5
r1 19 14 33
r1 . . .
r1 5 0 5
r2 2 4 6
r3 3 3 6
r4 3 1 4
r5 3 3 6
r6 0 2 2
r7 3 1 4
r1 19 14 33
r1 . . .
r1 6 7 13
r2 13 7 20
r1 19 14 33
r1 . . .
r1 . . .
r1 0 4 4
r2 2 3 5
r3 0 4 4
r4 5 0 5
r5 4 2 6
r6 4 1 5
r7 3 3 6
r1 18 17 35
r1 . . .
r1 1 4 5
r2 4 0 4
r3 4 2 6
r4 2 2 4
r5 3 3 6
r6 4 2 6
r7 0 4 4
r1 18 17 35
r1 . . .
r1 13 3 16
r2 5 14 19
r1 18 17 35
---------------------------------------------------
You will have to generate the rest of the elements you want separately (including column and row names etc.) but the idea is the same. You will also have to play with the options in esttab to fine tune the desired final outcome.
Note that the above can be written more efficiently but I have kept everything separate in this answer so you can understand it.
EDIT:
If you are working with matrices as above you can also use putexcel easily:
putexcel A1 = matrix(M)

Build a static outputTable (similar to a pivot table) with shiny

I have a table that has the following data (shortened for this example):
C1 C2 C3
1 0 1 1
2 1 1 0
3 1 0 1
4 1 1 1
5 0 0 1
6 0 0 0
I want to create a create a query that gives me the following result:
C1
C2 sum(C3)
It's similar to a pivot table but it's static.
Could you help me please, I'll be grateful.

sum data in multiple CSV's based on row and column python

I have multiple csv's and I want to add data present in these two csv's based on row and column keys
Example:
input1.csv
account,param1,param2,param3
D1,2,-1,0
D2,3,2,-2
D4,12,-1,-2
D3,1,1,0
input2.csv
account,param1,param2,param3
D4,22,-1,0
D6,3,2,-2
D1,-2,-1,-2
D3,1,1,0
output.csv
account,param1,param2,param3
D1,0,-2,0
D2,3,2,-2
D3,2,2,0
D4,34,-2,-2
D6,3,2,-2
So, In output.csv I need to have all accounts present in both csv's and for common accounts the param values needs to be added.
Note:The accounts are not in serial order
Here's one way using pd.concat
In [824]: df = pd.concat((pd.read_csv(f) for f in ['input1.csv', 'input2.csv']), ignore_index=True)
In [825]: df
Out[825]:
account param1 param2 param3
0 D1 2 -1 0
1 D2 3 2 -2
2 D4 12 -1 -2
3 D3 1 1 0
4 D4 22 -1 0
5 D6 3 2 -2
6 D1 -2 -1 -2
7 D3 1 1 0
In [826]: df.groupby('account', as_index=False).sum()
Out[826]:
account param1 param2 param3
0 D1 0 -2 -2
1 D2 3 2 -2
2 D3 2 2 0
3 D4 34 -2 -2
4 D6 3 2 -2
In [827]: df.groupby('account', as_index=False).sum().to_csv('output.csv', index=False)

SAS IF then statement

Hello for whatever reason my if then statement will not work for this code. What I am trying to get it to do is (kinda obvious but whatever) if the salary is LE 30,000 then make new variable income equal to low. Here is what I have so far.
data newdd2;
input subject group$ year salary : comma7. ##;
IF (salary <= 30,000) THEN income = 'low';
datalines;
1 A 2 53,900 2 B 2 37,400 3 A 1 49,500
4 C 2 43,900 5 B 3 38,400 6 A 3 39,500
7 A 3 53,600 8 B 2 37,700 9 C 1 49,900
10 C 2 43,300 11 B 3 57,400 12 B 3 39,500
13 B 1 33,900 14 A 2 41,400 15 C 2 49,500
16 C 1 43,900 17 B 1 39,400 18 A 3 39,900
19 A 2 53,600 20 A 2 37,700 21 C 3 42,900
22 C 2 43,300 23 B 1 57,400 24 C 3 69,500
25 C 2 33,900 26 A 2 35,300 27 A 2 47,500
28 C 2 43,900 29 B 3 38,400 30 A 1 32,500
31 A 3 53,600 32 B 2 37,700 33 C 1 41,600
34 C 2 43,300 35 B 3 57,400 36 B 3 39,500
37 B 2 33,900 38 A 2 41,400 39 C 2 79,500
40 C 1 43,900 41 C 1 29,500 42 A 3 39,900
43 A 2 53,600 44 A 2 37,500 45 C 3 42,900
46 C 2 43,300 47 B 1 47,400 48 C 3 59,500
run;
The error I keep getting is (The work dataset may be incomplete), however I am sure that my code is correct I've tried a number of things but no success yet thanks in advance.
You cannot use a comma in a numeric literal.
IF (salary <= 30000) THEN income = 'low';