subtract two columns of different Dataframe with python - python-2.7

I have two DataFrames, df1:
Lat1 Lon1 tp1
0 34.475000 349.835000 1
1 34.476920 349.862065 0.5
2 34.478833 349.889131 0
3 34.480739 349.916199 3
4 34.482639 349.943268 0
5 34.484532 349.970338 0
and df2:
Lat2 Lon2 tp2
0 34.475000 349.835000 2
1 34.476920 349.862065 1
2 34.478833 349.889131 0
3 34.480739 349.916199 6
4 34.482639 349.943268 0
5 34.484532 349.970338 0
I want to substract (tp1-tp2) columns and create a new dataframe whose colums are Lat1,lon1,tp1-tp2. anyone know how can I do it?

import pandas as pd
df3 = df1[['Lat1', 'Lon1']]
df3['tp1-tp2'] = df1.tp1 - df2.tp2
Out[97]:
Lat1 Lon1 tp1-tp2
0 34.4750 349.8350 -1.0
1 34.4769 349.8621 -0.5
2 34.4788 349.8891 0.0
3 34.4807 349.9162 -3.0
4 34.4826 349.9433 0.0
5 34.4845 349.9703 0.0

Related

How to plot shade red according to ratio variable using sgpanel plot

I would like to plot dataset and obtain desired output with the right setup.
Plot the scatter such that the points are in shade red-color, from light red to dark red depending on the scale (ratio) of 0-1 (0=light red, 1=dark red).
Show the legend also showing the scale red color according to the ration 0-1 (point 1.)
Data explanation:
area - city (shortcut)
id - user id
var - variable
time - datetime
exit - consumer left
ratio - proportion (between 0-1)
Data sample and attempt plotting (obviously not correct):
data data;
input area $ id $ var $ time $ exit $ ratio $;
datalines;
A 1 1 1 0 0.18
A 1 1 2 0 0.11
A 2 1 1 1 0.14
A 2 1 2 0 0.15
A 2 1 3 0 0.14
A 3 1 1 0 0.17
A 3 1 2 0 0.19
A 3 1 3 1 0.21
A 3 1 4 0 0.14
B 4 2 1 0 0.14
B 4 2 2 1 0.15
B 5 2 1 0 0.17
B 5 2 2 0 0.25
B 5 2 3 0 0.31
A 1 3 1 0 0.22
A 1 3 2 0 0.13
A 2 3 1 1 0.16
A 2 3 2 0 0.11
A 2 3 3 0 0.22
A 3 3 1 0 0.27
A 3 3 2 0 0.29
A 3 3 3 1 0.31
A 3 3 4 0 0.24
B 4 4 1 0 0.24
B 4 4 2 1 0.35
B 5 4 1 0 0.47
B 5 4 2 0 0.15
B 5 4 3 0 0.21
;;
run;
data attrs;
input id $ risk $ fillcolor $;
datalines;
ratio 0.05 Verylightred
ratio 0.15 Lightred
ratio 0.20 Red
ratio 0.25 Darkred
ratio 0.30 Verydarkred
ratio 0.35 Verydarkstrongred
;
run;
proc sgpanel data=data dattrmap=attrs;
panelby area exit;
scatter y=id x=var / markerattrs = (symbol = squarefilled) group=ratio attrid=ratio;
run;
This will get you closer.
Ratio should be numeric to be graphed
Ratio is continuous, how should it be used to group?
For the colour on the data attribute map, the length of the colours is not long enough and risk should be numeric
I don't know exactly how to specify the ranges you'd like for the colours you'd like but this gets you closer using the automatic legend.
One way to get at this is to add the variable to the data set for each group and then you can control the colour of each group with the data attribute map. This would mean adding a column in the 'data' data set called ratio_group whcih maps to the values in the data attribute map table. Use that variable the group.
data data;
input area $ id $ var $ time $ exit $ ratio ;
datalines;
A 1 1 1 0 0.18
A 1 1 2 0 0.11
A 2 1 1 1 0.14
A 2 1 2 0 0.15
A 2 1 3 0 0.14
A 3 1 1 0 0.17
A 3 1 2 0 0.19
A 3 1 3 1 0.21
A 3 1 4 0 0.14
B 4 2 1 0 0.14
B 4 2 2 1 0.15
B 5 2 1 0 0.17
B 5 2 2 0 0.25
B 5 2 3 0 0.31
A 1 3 1 0 0.22
A 1 3 2 0 0.13
A 2 3 1 1 0.16
A 2 3 2 0 0.11
A 2 3 3 0 0.22
A 3 3 1 0 0.27
A 3 3 2 0 0.29
A 3 3 3 1 0.31
A 3 3 4 0 0.24
B 4 4 1 0 0.24
B 4 4 2 1 0.35
B 5 4 1 0 0.47
B 5 4 2 0 0.15
B 5 4 3 0 0.21
;;
run;
proc sgpanel data=data ;
panelby area exit;
scatter y=id x=var / markerattrs = (symbol = squarefilled size=10)
colorresponse=ratio
colormodel=(verylightred lightred red darkred verydarkred verydarkstrongred);
colaxis grid minorgrid;
rowaxis grid minorgrid;
run;
For marker size look at the SIZE option under the MARKERATTRS option.
For grids, look at the GRID/MINORGRID options under the COLAXIS and ROWAXIS statements.
COLAXIS documentation

pandas.DataFrame: How to div row by row [python]

I want to div row[i] by row[i+1] in pandas.DataFrame
row[i] = row[i+1] / row[i]
for example:
1 2 3 4
4 2 6 2
8 5 3 1
the result is
0.25 1 0.5 2
0.5 0.4 2 2
You can divide by div shifted DataFrame, last remove NaN row by dropna:
print (df)
a b c d
0 1 2 3 4
1 4 2 6 2
2 8 5 3 1
print (df.div(df.shift(-1), axis=1))
a b c d
0 0.25 1.0 0.5 2.0
1 0.50 0.4 2.0 2.0
2 NaN NaN NaN NaN
df = df.div(df.shift(-1), axis=1).dropna(how='all')
print (df)
a b c d
0 0.25 1.0 0.5 2.0
1 0.50 0.4 2.0 2.0
Another solution for remove last row is select by iloc:
df = df.div(df.shift(-1), axis=1).iloc[:-1]
print (df)
a b c d
0 0.25 1.0 0.5 2.0
1 0.50 0.4 2.0 2.0

how to write a matrix to a file in python with this format?

I need to write a matrix to a file with this format (i, j, a[i,j]) row by row, but I don't know how to get it. I tried with: np.savetxt(f, A, fmt='%1d', newline='\n'), but it write only matrix values and don't write i, j!
import numpy as np
a = np.arange(12).reshape(4,3)
a_with_index = np.array([idx+(val,) for idx, val in np.ndenumerate(a)])
np.savetxt('/tmp/out', a_with_index, fmt='%d')
writes to /tmp/out the contents
0 0 0
0 1 10
0 2 20
1 0 30
1 1 40
1 2 50
2 0 60
2 1 70
2 2 80
3 0 90
3 1 100
3 2 110
If your array datatype is not a sort of integer, you'll probably have to write your own function to save it along with its indices, since these are integers. For example,
import numpy as np
def savetxt_with_indices(filename, arr, fmt):
nrows, ncols = arr.shape
indexes = np.empty((nrows*ncols, 2))
indexes[:,0] = np.repeat(np.arange(nrows), ncols)
indexes[:,1] = np.tile(np.arange(ncols), nrows)
fmt = '%4d %4d ' + fmt
flat_arr = arr.flatten()
with open(filename, 'w') as fo:
for i in range(nrows*ncols):
print(fmt % (indexes[i, 0], indexes[i, 1], flat_arr[i]), file=fo)
A = np.arange(12.).reshape((4,3))
savetxt_with_indices('test.txt', A, '%6.2f')
0 0 0.00
0 1 1.00
0 2 2.00
1 0 3.00
1 1 4.00
1 2 5.00
2 0 6.00
2 1 7.00
2 2 8.00
3 0 9.00
3 1 10.00
3 2 11.00

How to add a number to a portion of dataframe column in pandas?

I have a dataframe with two columns A and B.
A B
1 0
2 0
3 1
4 2
5 0
6 3
What I want to do is to add column A with with column B. But only with the corresponding non zero values of column B. And put the result on column B.
A B
1 0
2 0
3 4
4 6
5 0
6 9
Thank you for your help and sugestion in advance.
use .loc with a boolean mask:
In [49]:
df.loc[df['B'] != 0, 'B'] = df['A'] + df['B']
df
Out[49]:
A B
0 1 0
1 2 0
2 3 4
3 4 6
4 5 0
5 6 9

pandas pivot table using index data of dataframe

I want to create a pivot table from a pandas dataframe
using dataframe.pivot()
and include not only dataframe columns but also the data within the dataframe index.
Couldn't find any docs that show how to do that.
Any tips?
Use reset_index to make the index a column:
In [45]: df = pd.DataFrame({'y': [0, 1, 2, 3, 4, 4], 'x': [1, 2, 2, 3, 1, 3]}, index=np.arange(6)*10)
In [46]: df
Out[46]:
x y
0 1 0
10 2 1
20 2 2
30 3 3
40 1 4
50 3 4
In [47]: df.reset_index()
Out[47]:
index x y
0 0 1 0
1 10 2 1
2 20 2 2
3 30 3 3
4 40 1 4
5 50 3 4
So pivot uses the index as values:
In [48]: df.reset_index().pivot(index='y', columns='x')
Out[48]:
index
x 1 2 3
y
0 0 NaN NaN
1 NaN 10 NaN
2 NaN 20 NaN
3 NaN NaN 30
4 40 NaN 50