Is there a way to plot multiple dataframe columns on one plot, with several subplots for the dataframe?
E.g. If df has 12 data columns, on subplot 1, plot columns 1-3, subplot 2, columns 4-6, etc.
I understand how to use df.plot to have one subplot for each column, but am not sure how to group as specified above.
Thanks!
This is an example of how I do it:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 2)
np.random.seed([3,1415])
df = pd.DataFrame(np.random.randn(100, 6), columns=list('ABCDEF'))
df = df.div(100).add(1.01).cumprod()
df.iloc[:, :3].plot(ax=axes[0])
df.iloc[:, 3:].plot(ax=axes[1])
Related
My dat.csv is as follows:
State, Pop
AP,100
UP,200
TN,90
I want to plot it and so my code is as follows:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('dat.csv')
df.plot(kind='bar').set_xticklabels(df.State)
plt.show()
However, I want to replace the labels which are in another csv file,
labels.csv
Column,Name,Level,Rename
State,AP,AP,Andhra Pradesh
State,TN,TN,Tamil Nadu
State,UP,UP,Uttar Pradesh
Is it possible for me to replace the labels in the plot with the labels in my labels.csv file?
Using merge + set_index
df=df.merge(labels,left_on='State',right_on='Name',how='left')
df
Out[1094]:
State Pop Column Name Level Rename
0 AP 100 State AP AP Andhra Pradesh
1 UP 200 State UP UP Uttar Pradesh
2 TN 90 State TN TN Tamil Nadu
df.set_index('Rename')['Pop'].plot(kind='Bar')
I have following string value obtained from a pandas dataframe.
u'1:19 AM Eastern, Tuesday, May 16, 2017'
How do I convert it to a datetime.datetime(2017,5,16) object?
Thx.
You need to create a custom date parser, to give you some ideas here's a reproducible example:
import pandas as pd
import datetime
from StringIO import StringIO
st = u'01:19 AM Eastern, Tuesday, May 16, 2017'
def parse_date(date_string):
date_string = ",".join(date_string.split(',')[-2:]).strip()
return datetime.datetime.strptime(date_string, '%B %d, %Y')
df = pd.read_csv(StringIO(st), header=None, sep="|", date_parser=parse_date, parse_dates=[0])
If you print the dataFrame content as follows, :
print("dataframe content")
print(df)
you will get this output:
dataframe content
0
0 2017-05-16
checking the dtypes confirms that the column is now of type datetime:
print("dataframe types")
print(df.dtypes)
output:
dataframe types
0 datetime64[ns]
dtype: object
I need to build a DataFrame with a very specific structure. Yield curve values as the data, a single date as the index, and days to maturity as the column names.
In[1]: yield_data # list of size 38, with yield values
Out[1]:
[0.096651956137087325,
0.0927199778042056,
0.090000225505577847,
0.088300016028163508,...
In[2]: maturity_data # list of size 38, with days until maturity
Out[2]:
[6,
29,
49,
70,...
In[3]: today
Out[3]:
Timestamp('2017-07-24 00:00:00')
Then I try to create the DataFrame
pd.DataFrame(data=yield_data, index=[today], columns=maturity_data)
but it returns the error
ValueError: Shape of passed values is (1, 38), indices imply (38, 1)
I tried using the transpose of these lists, but it does not allow to transpose them.
how can I create this DataFrame?
IIUC, I think you want a dataframe with a single row, you need to reshape your data input list into a list of list.
yield_data = [0.09,0.092, 0.091]
maturity_data = [6,10,15]
today = pd.to_datetime('2017-07-25')
pd.DataFrame(data=[yield_data],index=[today],columns=maturity_data)
Output:
6 10 15
2017-07-25 0.09 0.092 0.091
I have this dataset, where columns are values across the years and rows values in Sterling Pounds:
total_2013 total2014 total_2015 total_2016
0 1569000 1614000 1644000 1659000
1 330000 423000 474000 540000
2 2100000 2080000 2093000 2135000
3 2161000 2238000 2221000 2200000
4 1865000 1975000 2046000 2152000
5 1903000 1972000 1970000 2034000
6 2087000 2091000 1963000 1956000
7 1237000 1231000 1199000 1188000
8 1043000 1072000 1076000 1059000
9 569000 610000 564000 592000
10 2207000 2287000 2191000 2274000
11 1908000 1908000 1917000 1908000
I need to plot in the X-axis the columns total_2013, total_2014, total_2015, total_2016.
In the Y-axis, I need to draw a point for the single values of each row and draw a line between these values along the years 2013-2016.
I couldn't make work xticks to put the values in the columns as different time points in the X-axis. Not sure if that's the right thing to do neither.
I have plotted so far this:
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
matplotlib.style.use('ggplot')
ax = data_merged_years.plot(kind='line', title ="UK Schools budget per year 2013-2016", figsize=(15, 10), legend=True, fontsize=12)
ax.set_xlabel("Year", fontsize=12)
ax.set_ylabel("Pounds", fontsize=12)
plt.show()
My attempt: Wrong plot with incorrect X and Y:
With your script the output I can produce is this:
If you want the columns as x-axis rather than different lines you could transpose the DataFrame before plotting:
In [13]: import matplotlib.pyplot as plt
...: import pandas as pd
...: matplotlib.style.use('ggplot')
...: ax = data_merged_years.transpose().plot(kind='line', title ="UK Schools budget per year 2013-2016", figsize=(15, 10), legend=True, fontsize=12)
...: ax.set_xlabel("Year", fontsize=12)
...: ax.set_ylabel("Pounds", fontsize=12)
...: plt.show()
...:
From your original plot it seems that you would have over ten thousand lines in one plot. Are you sure that is what you want?
I have a time series similar to:
ts = pd.Series(np.random.randn(60),index=pd.date_range('1/1/2000',periods=60, freq='2h'))
Is there an easy way to make it so that the row index is dates and the column index is the hour?
Basically I am trying to convert from a time-series into a dataframe.
There's always a slicker way to do things than the way I reach for, but I'd make a flat frame first and then pivot. Something like
>>> ts = pd.Series(np.random.randn(10000),index=pd.date_range('1/1/2000',periods=10000, freq='10min'))
>>> df = pd.DataFrame({"date": ts.index.date, "time": ts.index.time, "data": ts.values})
>>> df = df.pivot("date", "time", "data")
This produces too large a frame to paste, but looking the top left corner:
>>> df.iloc[:5, :5]
time 00:00:00 00:10:00 00:20:00 00:30:00 00:40:00
date
2000-01-01 -0.180811 0.672184 0.098536 -0.687126 -0.206245
2000-01-02 0.746777 0.630105 0.843879 -0.253666 1.337123
2000-01-03 1.325679 0.046904 0.291343 -0.467489 -0.531110
2000-01-04 -0.189141 -1.346146 1.378533 0.887792 2.957479
2000-01-05 -0.232299 -0.853726 -0.078214 -0.158410 0.782468
[5 rows x 5 columns]