pandas.DataFrame: How to div row by row [python]

pandas.DataFrame: How to div row by row [python] - python-2.7

I want to div row[i] by row[i+1] in pandas.DataFrame
row[i] = row[i+1] / row[i]
for example:
1 2 3 4
4 2 6 2
8 5 3 1
the result is
0.25 1 0.5 2
0.5 0.4 2 2

You can divide by div shifted DataFrame, last remove NaN row by dropna:
print (df)
a b c d
0 1 2 3 4
1 4 2 6 2
2 8 5 3 1
print (df.div(df.shift(-1), axis=1))
a b c d
0 0.25 1.0 0.5 2.0
1 0.50 0.4 2.0 2.0
2 NaN NaN NaN NaN
df = df.div(df.shift(-1), axis=1).dropna(how='all')
print (df)
a b c d
0 0.25 1.0 0.5 2.0
1 0.50 0.4 2.0 2.0
Another solution for remove last row is select by iloc:
df = df.div(df.shift(-1), axis=1).iloc[:-1]
print (df)
a b c d
0 0.25 1.0 0.5 2.0
1 0.50 0.4 2.0 2.0

Related

How to plot shade red according to ratio variable using sgpanel plot

I would like to plot dataset and obtain desired output with the right setup.
Plot the scatter such that the points are in shade red-color, from light red to dark red depending on the scale (ratio) of 0-1 (0=light red, 1=dark red).
Show the legend also showing the scale red color according to the ration 0-1 (point 1.)
Data explanation:
area - city (shortcut)
id - user id
var - variable
time - datetime
exit - consumer left
ratio - proportion (between 0-1)
Data sample and attempt plotting (obviously not correct):
data data;
input area $ id $ var $ time $ exit $ ratio $;
datalines;
A 1 1 1 0 0.18
A 1 1 2 0 0.11
A 2 1 1 1 0.14
A 2 1 2 0 0.15
A 2 1 3 0 0.14
A 3 1 1 0 0.17
A 3 1 2 0 0.19
A 3 1 3 1 0.21
A 3 1 4 0 0.14
B 4 2 1 0 0.14
B 4 2 2 1 0.15
B 5 2 1 0 0.17
B 5 2 2 0 0.25
B 5 2 3 0 0.31
A 1 3 1 0 0.22
A 1 3 2 0 0.13
A 2 3 1 1 0.16
A 2 3 2 0 0.11
A 2 3 3 0 0.22
A 3 3 1 0 0.27
A 3 3 2 0 0.29
A 3 3 3 1 0.31
A 3 3 4 0 0.24
B 4 4 1 0 0.24
B 4 4 2 1 0.35
B 5 4 1 0 0.47
B 5 4 2 0 0.15
B 5 4 3 0 0.21
;;
run;
data attrs;
input id $ risk $ fillcolor $;
datalines;
ratio 0.05 Verylightred
ratio 0.15 Lightred
ratio 0.20 Red
ratio 0.25 Darkred
ratio 0.30 Verydarkred
ratio 0.35 Verydarkstrongred
;
run;
proc sgpanel data=data dattrmap=attrs;
panelby area exit;
scatter y=id x=var / markerattrs = (symbol = squarefilled) group=ratio attrid=ratio;
run;

This will get you closer.
Ratio should be numeric to be graphed
Ratio is continuous, how should it be used to group?
For the colour on the data attribute map, the length of the colours is not long enough and risk should be numeric
I don't know exactly how to specify the ranges you'd like for the colours you'd like but this gets you closer using the automatic legend.
One way to get at this is to add the variable to the data set for each group and then you can control the colour of each group with the data attribute map. This would mean adding a column in the 'data' data set called ratio_group whcih maps to the values in the data attribute map table. Use that variable the group.
data data;
input area $ id $ var $ time $ exit $ ratio ;
datalines;
A 1 1 1 0 0.18
A 1 1 2 0 0.11
A 2 1 1 1 0.14
A 2 1 2 0 0.15
A 2 1 3 0 0.14
A 3 1 1 0 0.17
A 3 1 2 0 0.19
A 3 1 3 1 0.21
A 3 1 4 0 0.14
B 4 2 1 0 0.14
B 4 2 2 1 0.15
B 5 2 1 0 0.17
B 5 2 2 0 0.25
B 5 2 3 0 0.31
A 1 3 1 0 0.22
A 1 3 2 0 0.13
A 2 3 1 1 0.16
A 2 3 2 0 0.11
A 2 3 3 0 0.22
A 3 3 1 0 0.27
A 3 3 2 0 0.29
A 3 3 3 1 0.31
A 3 3 4 0 0.24
B 4 4 1 0 0.24
B 4 4 2 1 0.35
B 5 4 1 0 0.47
B 5 4 2 0 0.15
B 5 4 3 0 0.21
;;
run;
proc sgpanel data=data ;
panelby area exit;
scatter y=id x=var / markerattrs = (symbol = squarefilled size=10)
colorresponse=ratio
colormodel=(verylightred lightred red darkred verydarkred verydarkstrongred);
colaxis grid minorgrid;
rowaxis grid minorgrid;
run;
For marker size look at the SIZE option under the MARKERATTRS option.
For grids, look at the GRID/MINORGRID options under the COLAXIS and ROWAXIS statements.
COLAXIS documentation

Ranking each row to create mask dataframe

Sample dataframe df is defined as follows:
import numpy as np
import pandas as pd
df = pd.DataFrame(10*(2+np.random.randn(500, 8)), columns=list('ABCDEFGH'))
Within each row rank the top 5 columns and mark them as 1 and rest as nan.
df looked like
df.head()
A B C D E F G H
0 6.598436 44.318800 18.064752 13.418329 17.145434 6.696975 14.757765 8.797826
1 3.593140 14.571717 16.292330 28.390669 35.289606 -4.273124 20.519388 25.137833
2 36.777253 34.360523 28.020462 15.356690 22.038938 14.960303 15.225555 34.691981
3 18.623122 27.184421 -5.320215 31.694895 21.156375 9.947077 20.257575 21.035659
4 11.864725 30.458160 13.509029 27.037195 20.581043 25.371691 1.094735 28.703618
Desired output is:
df_output.head()
A B C D E F G H
0 nan 1 1 1 1 nan 1 nan
1 nan nan 1 1 1 nan 1 1
2 1 1 1 nan 1 nan nan 1
3 nan 1 nan 1 1 nan 1 1
4 nan 1 nan 1 1 1 nan 1

df_output = df.rank(1, ascending=False, method='first')
df_output[df_output > 5] = np.nan
df_output[df_output <= 5] = 1.0

Merging multiple .txt files into a csv

*New to Python.
I'm trying to merge multiple text files into 1 csv; example below -
filename.csv
Alpha
0
0.1
0.15
0.2
0.25
0.3
text1.txt
Alpha,Beta
0,10
0.2,20
0.3,30
text2.txt
Alpha,Charlie
0.1,5
0.15,15
text3.txt
Alpha,Delta
0.1,10
0.15,20
0.2,50
0.3,10
Desired output in the csv file: -
filename.csv
Alpha Beta Charlie Delta
0 10 0 0
0.1 0 5 10
0.15 0 15 20
0.2 20 0 50
0.25 0 0 0
0.3 30 0 10
The code I've been working with and others that were provided give me an answer similar to what is at the bottom of the page
def mergeData(indir="Dir Path", outdir="Dir Path"):
dfs = []
os.chdir(indir)
fileList=glob.glob("*.txt")
for filename in fileList:
left= "/Path/Final.csv"
right = filename
output = "/Path/finalMerged.csv"
leftDf = pandas.read_csv(left)
rightDf = pandas.read_csv(right)
mergedDf = pandas.merge(leftDf,rightDf,how='inner',on="Alpha", sort=True)
dfs.append(mergedDf)
outputDf = pandas.concat(dfs, ignore_index=True)
outputDf = pandas.merge(leftDf, outputDf, how='inner', on='Alpha', sort=True, copy=False).fillna(0)
print (outputDf)
outputDf.to_csv(output, index=0)
mergeData()
The answer I get however is instead of the desired result: -
Alpha Beta Charlie Delta
0 10 0 0
0.1 0 5 0
0.1 0 0 10
0.15 0 15 0
0.15 0 0 20
0.2 20 0 0
0.2 0 0 50
0.25 0 0 0
0.3 30 0 0
0.3 0 0 10

IIUC you can create list of all DataFrames - dfs, in loop append mergedDf and last concat all DataFrames to one:
import pandas
import glob
import os
def mergeData(indir="dir/path", outdir="dir/path"):
dfs = []
os.chdir(indir)
fileList=glob.glob("*.txt")
for filename in fileList:
left= "/path/filename.csv"
right = filename
output = "/path/filename.csv"
leftDf = pandas.read_csv(left)
rightDf = pandas.read_csv(right)
mergedDf = pandas.merge(leftDf,rightDf,how='right',on="Alpha", sort=True)
dfs.append(mergedDf)
outputDf = pandas.concat(dfs, ignore_index=True)
#add missing rows from leftDf (in sample Alpha - 0.25)
#fill NaN values by 0
outputDf = pandas.merge(leftDf,outputDf,how='left',on="Alpha", sort=True).fillna(0)
#columns are converted to int
outputDf[['Beta', 'Charlie']] = outputDf[['Beta', 'Charlie']].astype(int)
print (outputDf)
outputDf.to_csv(output, index=0)
mergeData()
Alpha Beta Charlie
0 0.00 10 0
1 0.10 0 5
2 0.15 0 15
3 0.20 20 0
4 0.25 0 0
5 0.30 30 0
EDIT:
Problem is you change parameter how='left' in second merge to how='inner':
def mergeData(indir="Dir Path", outdir="Dir Path"):
dfs = []
os.chdir(indir)
fileList=glob.glob("*.txt")
for filename in fileList:
left= "/Path/Final.csv"
right = filename
output = "/Path/finalMerged.csv"
leftDf = pandas.read_csv(left)
rightDf = pandas.read_csv(right)
mergedDf = pandas.merge(leftDf,rightDf,how='inner',on="Alpha", sort=True)
dfs.append(mergedDf)
outputDf = pandas.concat(dfs, ignore_index=True)
#need left join, not inner
outputDf = pandas.merge(leftDf, outputDf, how='left', on='Alpha', sort=True, copy=False)
.fillna(0)
print (outputDf)
outputDf.to_csv(output, index=0)
mergeData()
Alpha Beta Charlie Delta
0 0.00 10.0 0.0 0.0
1 0.10 0.0 5.0 0.0
2 0.10 0.0 0.0 10.0
3 0.15 0.0 15.0 0.0
4 0.15 0.0 0.0 20.0
5 0.20 20.0 0.0 0.0
6 0.20 0.0 0.0 50.0
7 0.25 0.0 0.0 0.0
8 0.30 30.0 0.0 0.0
9 0.30 0.0 0.0 10.0

import pandas as pd
data1 = pd.read_csv('samp1.csv',sep=',')
data2 = pd.read_csv('samp2.csv',sep=',')
data3 = pd.read_csv('samp3.csv',sep=',')
df1 = pd.DataFrame({'Alpha':data1.Alpha})
df2 = pd.DataFrame({'Alpha':data2.Alpha,'Beta':data2.Beta})
df3 = pd.DataFrame({'Alpha':data3.Alpha,'Charlie':data3.Charlie})
mergedDf = pd.merge(df1, df2, how='outer', on ='Alpha',sort=False)
mergedDf1 = pd.merge(mergedDf, df3, how='outer', on ='Alpha',sort=False)
a = pd.DataFrame(mergedDf1)
print(a.drop_duplicates())
output:
Alpha Beta Charlie
0 0.00 10.0 NaN
1 0.10 NaN 5.0
2 0.15 NaN 15.0
3 0.20 20.0 NaN
4 0.25 NaN NaN
5 0.30 30.0 NaN

how to write a matrix to a file in python with this format?

I need to write a matrix to a file with this format (i, j, a[i,j]) row by row, but I don't know how to get it. I tried with: np.savetxt(f, A, fmt='%1d', newline='\n'), but it write only matrix values and don't write i, j!

import numpy as np
a = np.arange(12).reshape(4,3)
a_with_index = np.array([idx+(val,) for idx, val in np.ndenumerate(a)])
np.savetxt('/tmp/out', a_with_index, fmt='%d')
writes to /tmp/out the contents
0 0 0
0 1 10
0 2 20
1 0 30
1 1 40
1 2 50
2 0 60
2 1 70
2 2 80
3 0 90
3 1 100
3 2 110

If your array datatype is not a sort of integer, you'll probably have to write your own function to save it along with its indices, since these are integers. For example,
import numpy as np
def savetxt_with_indices(filename, arr, fmt):
nrows, ncols = arr.shape
indexes = np.empty((nrows*ncols, 2))
indexes[:,0] = np.repeat(np.arange(nrows), ncols)
indexes[:,1] = np.tile(np.arange(ncols), nrows)
fmt = '%4d %4d ' + fmt
flat_arr = arr.flatten()
with open(filename, 'w') as fo:
for i in range(nrows*ncols):
print(fmt % (indexes[i, 0], indexes[i, 1], flat_arr[i]), file=fo)
A = np.arange(12.).reshape((4,3))
savetxt_with_indices('test.txt', A, '%6.2f')
0 0 0.00
0 1 1.00
0 2 2.00
1 0 3.00
1 1 4.00
1 2 5.00
2 0 6.00
2 1 7.00
2 2 8.00
3 0 9.00
3 1 10.00
3 2 11.00

subtract two columns of different Dataframe with python

I have two DataFrames, df1:
Lat1 Lon1 tp1
0 34.475000 349.835000 1
1 34.476920 349.862065 0.5
2 34.478833 349.889131 0
3 34.480739 349.916199 3
4 34.482639 349.943268 0
5 34.484532 349.970338 0
and df2:
Lat2 Lon2 tp2
0 34.475000 349.835000 2
1 34.476920 349.862065 1
2 34.478833 349.889131 0
3 34.480739 349.916199 6
4 34.482639 349.943268 0
5 34.484532 349.970338 0
I want to substract (tp1-tp2) columns and create a new dataframe whose colums are Lat1,lon1,tp1-tp2. anyone know how can I do it?

import pandas as pd
df3 = df1[['Lat1', 'Lon1']]
df3['tp1-tp2'] = df1.tp1 - df2.tp2
Out[97]:
Lat1 Lon1 tp1-tp2
0 34.4750 349.8350 -1.0
1 34.4769 349.8621 -0.5
2 34.4788 349.8891 0.0
3 34.4807 349.9162 -3.0
4 34.4826 349.9433 0.0
5 34.4845 349.9703 0.0

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

pandas.DataFrame: How to div row by row [python] - python-2.7

I want to div row[i] by row[i+1] in pandas.DataFrame row[i] = row[i+1] / row[i] for example: 1 2 3 4 4 2 6 2 8 5 3 1 the result is 0.25 1 0.5 2 0.5 0.4 2 2

Related

How to plot shade red according to ratio variable using sgpanel plot

Ranking each row to create mask dataframe

Merging multiple .txt files into a csv

how to write a matrix to a file in python with this format?

subtract two columns of different Dataframe with python

Categories

Resources