Basically, I want to not plot extremes in my graph. I thought doing this based on the slope of the graph would be a good idea, but for some reason I keep getting the error that the dates on my x-axis do not exist (DataFrame has no attribute Datumtijd). (Edit: Removed file location as question has been answered)
from pylab import *
import matplotlib.pyplot as plt
import matplotlib.dates as pld
%matplotlib inline
import pandas as pd
from pandas import DataFrame
pbn135 = pd.read_csv('3873_135.csv', parse_dates=[0], index_col = 0, dayfirst = True, delimiter = ';', usecols = ['Datumtijd','DisplayWaarde'])
pbn135.plot()
for i in range(len(pbn135)):
slope = (pbn135.DisplayWaarde[i+1]-pbn135.DisplayWaarde[i])/(pbn135.Datumtijd[i+1]-pbn135.Datumtijd[i])
Python can't operate with DateTime. Converting the DateTime to an integer works. Usually done by calculating the total seconds from a reference date (e.g. 1 jan 2015).
This is done by importing datetime from datetime. Then by setting a reference date datetime(2015,1,1) the seconds are calculted with total_seconds().
However this does create a slope where the interval is in seconds and not the interval of your datetime. If anyone knows how to fix that without manually entering a division please let us know
from datetime import datetime
for i in range(len(pbn135)):
slope = (pbn135.pbn73[i+1]-pbn135.pbn73[i])/((pbn135.index[i+1]-datetime(2015,1,1)).total_seconds()-(pbn135.index[i]-datetime(2015,1,1)).total_seconds())
print slope
Related
I am on python 2.7, with spyder IDE and this is my data:
Duration ptno
7432.0 X35133502100
7432.0 X35133502100
35255.0 T7956000304
35255.0 T7956000304
17502.0 T7956000304
17502.0 T7956000304
46.0 T7956000304
46.0 T7956000304
The code:
import time
import pandas as pd
import matplotlib.pyplot as plt
df1 = pd.read_csv('Nissin_11.09.2018.csv')
bx = df1.plot.bar(x='ptno', y='d', rot=0)
plt.setp(bx.get_xticklabels(),rotation=30,horizontalalignment='right')
plt.show()
I get a nice bar plot as I wanted for each value mentioned in columns Duration & ptno. For reference I am attaching image file of the plot.
But when I try to get a scatter plot with:
df1.plot.scatter(x='ptno', y='d')
It throws a error as :
ValueError: scatter requires x column to be numeric
How can I have a 'scatter' plot for my data ??
As suggested by #Hristo Iliev I used his code:
import seaborn as sns
_ = sns.stripplot(x='ptno', y='d', data=df1)
But It only plot two unique values on axis where I would like to have all values on x axis as my bar plot has x axis values.
One option is to use pure matplotlib. You need to create an array of numbers to use as the x axis, i.e. [1,2,3,4,5,...] and then change the tick labels to the value of the column ptno.
For example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df1 = pd.DataFrame({"Duration":[7432,7432,35255,35255,17502,17502,46,46],
"ptno":["X35", "X35", "T79", "T79", "T79", "T79", "T79", "T79"]})
dummy_x = np.arange(len(df1.ptno))
plt.scatter(dummy_x, df1.Duration)
plt.xticks(dummy_x, df1.ptno)
plt.show()
You cannot make scatter plots with non-numeric values as indicated by the error. In a scatter plot, the position of each point is determined by the location on the real axis of the value of each variable. Categorical or string values such as T7956000304 have no direct mapping to a position on the real axis.
What you can plot though is a series of strip plots, one for each unique value of ptno. That's easiest to do with Seaborn:
import seaborn as sns
_ = sns.stripplot(x='ptno', y='d', data=df1)
Cant figure out how to combine these two plots.
Here is the relevant code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%pylab inline #Jupyter Notebook
df_state.head()
lm_orginal_plot.head()
outputs for .head()
df_state.loc['Alabama'][['month','total purchases',
'permit','permit_recheck']].plot(x='month',figsize=(6,3),linestyle='-', marker='o')
lm_original_plot.plot(x='month',figsize=(6,3),linestyle=':');
outputs for plots
This is how I would do this (not saying it is the best method or anything):
1) merge two dfs on month
all = df_state(lm_original_plot, on = 'month', how='left')
2) create figure (total is now column just like the other variables in
the first chart, so you can just add ‘total’ to your first chart code)
Not my work, just what a peer showed me.
I have a csv file and I am trying to plot the average of some values per month. My csv file is structured as shown below, so I believe that I should group my data daily, then monthly in order to calculate the mean value.
timestamp,heure,lat,lon,impact,type
2007-01-01 00:00:00,13:58:43,33.837,-9.205,10.3,1
2007-01-02 00:00:00,00:07:28,34.5293,-10.2384,17.7,1
2007-01-02 00:00:00,23:01:03,35.0617,-1.435,-17.1,2
2007-01-03 00:00:00,01:14:29,36.5685,0.9043,36.8,1
2007-01-03 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1
I am using this code:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
df= pd.read_csv("ave.txt", sep=',', names =["timestamp","heure","lat","lon","impact","type"])
daily = df.set_index('timestamp').groupby(pd.TimeGrouper(key='timestamp', freq='D', axis=1), axis=1)['impact'].count()
monthly = daily.groupby(pd.TimeGrouper(freq='M')).mean()
ax = monthly.plot(kind='bar')
plt.show()
But, I keep getting errors like this:
KeyError: 'The grouper name timestamp is not found'
any ideas ??
You're getting this error because you've set the timestamp column to index. Try removing key='timestamp' from TimeGrouper() or the set_index method and it should group as you expect:
daily = df.set_index('timestamp').groupby(pd.TimeGrouper(freq='D', axis=1), axis=1)['impact'].count()
or
daily = df.groupby(pd.TimeGrouper(key='timestamp', freq='D', axis=1), axis=1)['impact'].count()
I believe you need DataFrame.resample.
Also is necessary convert timestamp to DataTimeindex by parameter parse_dates and index_col in read_csv.
names =["timestamp","heure","lat","lon","impact","type"]
data = pd.read_csv('fou.txt',names=names, parse_dates=['timestamp'],index_col=['timestamp'])
print (data.head())
#your code
daily = data.groupby(pd.TimeGrouper(freq='D'))['impact'].count()
monthly = daily.groupby(pd.TimeGrouper(freq='M')).mean()
ax = monthly.plot(kind='bar')
plt.show()
#more simpliest
daily = data.resample('D')['impact'].count()
monthly = daily.resample('M').mean()
ax = monthly.plot(kind='bar')
plt.show()
Also check if really need count, not size.
What is the difference between size and count in pandas?
daily = data.resample('D')['impact'].size()
monthly = daily.resample('M').mean()
ax = monthly.plot(kind='bar')
plt.show()
I've been having some difficulty with Matplotlib's finance charting. It seems like their candlestick charts work best with daily data, and I am having a hard time making them work with intraday (every 5 minutes, between 9:30 and 4 pm) data.
I have pasted sample data in pastebin. The top is what I get from the database, and the bottom is tupled with the date formatted into an ordinal float for use in Matplotlib.
Link to sample data
When I draw my charts there are huge gaps in it, the axes suck, and the zoom is equally horrible. http://imgur.com/y7O8A
How do I make a nice readable graph out of this data? My ultimate goal is to get a chart that looks remotely like this:
http://i.imgur.com/EnrTW.jpg
The data points can be in various increments from 5 minutes to 30 minutes.
I have also made a Pandas dataframe of the data, but I am not sure if pandas has candlestick functionality.
If I understand well, one of your major concern is the gaps between the daily data.
To get rid of them, one method is to artificially 'evenly space' your data (but of course you will loose any temporal indication intra-day).
Anyways, doing this way, you will be able to obtain a chart that looks like the one you have proposed as an example.
The commented code and the resulting graph are below.
import numpy as np
import matplotlib.pyplot as plt
import datetime
from matplotlib.finance import candlestick
from matplotlib.dates import num2date
# data in a text file, 5 columns: time, opening, close, high, low
# note that I'm using the time you formated into an ordinal float
data = np.loadtxt('finance-data.txt', delimiter=',')
# determine number of days and create a list of those days
ndays = np.unique(np.trunc(data[:,0]), return_index=True)
xdays = []
for n in np.arange(len(ndays[0])):
xdays.append(datetime.date.isoformat(num2date(data[ndays[1],0][n])))
# creation of new data by replacing the time array with equally spaced values.
# this will allow to remove the gap between the days, when plotting the data
data2 = np.hstack([np.arange(data[:,0].size)[:, np.newaxis], data[:,1:]])
# plot the data
fig = plt.figure(figsize=(10, 5))
ax = fig.add_axes([0.1, 0.2, 0.85, 0.7])
# customization of the axis
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')
ax.tick_params(axis='both', direction='out', width=2, length=8,
labelsize=12, pad=8)
ax.spines['left'].set_linewidth(2)
ax.spines['bottom'].set_linewidth(2)
# set the ticks of the x axis only when starting a new day
ax.set_xticks(data2[ndays[1],0])
ax.set_xticklabels(xdays, rotation=45, horizontalalignment='right')
ax.set_ylabel('Quote ($)', size=20)
ax.set_ylim([177, 196])
candlestick(ax, data2, width=0.5, colorup='g', colordown='r')
plt.show()
I got tired of matplotlib's (and plotly's) bad performance and lack of such features you request, so implemented one of my own. Here's how that works:
import finplot as fplt
import yfinance
df = yfinance.download('AAPL')
fplt.candlestick_ochl(df[['Open', 'Close', 'High', 'Low']])
fplt.show()
Not only are days in which the exchange is closed left out automatically, but also has better performance and a nicer api. For something that more resembles what you're ultimately looking for:
import finplot as fplt
import yfinance
symbol = 'AAPL'
df = yfinance.download(symbol)
ax = fplt.create_plot(symbol)
fplt.candlestick_ochl(df[['Open', 'Close', 'High', 'Low']], ax=ax)
fplt.plot(df['Close'].rolling(200).mean(), ax=ax, legend='SMA 200')
fplt.plot(df['Close'].rolling(50).mean(), ax=ax, legend='SMA 50')
fplt.plot(df['Close'].rolling(20).mean(), ax=ax, legend='SMA 20')
fplt.volume_ocv(df[['Open', 'Close', 'Volume']], ax=ax.overlay())
fplt.show()
Hello So I'm trying to trying to change the tick mark increments of the following plot to numbers that are more appropriate ie. increments of 1 on the x-axis and 10 on the y-axis.
Plot I'm trying to fix
The code I've tried is bellow:
Any help would be much appreciated!!!
import netCDF4
f = netCDF4.Dataset('AVSA.nc','r')
#plot Daily average
import matplotlib.pyplot as plt
v= f.variables['emissions'][0,0:24,0]
plt.plot(v, linestyle='-',linewidth=5.0, c='c')
plt.xlabel('Hour')
plt.ylabel('Emissions')
plt.title (' Emissions 3')
plt.ylim(0, 180)
plt.xlim(0,23)
plt.show()
You can write and position the ticks directly using:
plt.xticks(range(25),[str(i) for i in range(25)])
plt.yticks(range(0,180,10),[str(i) for i in range(0,180,10)])
In your code (I generated some data since I don't have your file) you would have:
import matplotlib.pyplot as plt
v= range(25)#f.variables['store_Bio'][0,0:24,0]
plt.plot(v, linestyle='-',linewidth=5.0, c='c')
plt.xticks(range(25),[str(i) for i in range(25)])
plt.yticks(range(0,180,10),[str(i) for i in range(0,180,10)])
plt.xlabel('Hour')
plt.ylabel('Average Biogenic Emissions')
plt.title ('Daily average Biogenic Emissions March 2013')
plt.ylim(0, 180)
plt.xlim(0,23)
plt.show()
, which results in: