How to control scientific notation in matplotlib? - python-2.7

This is my data frame I'm trying to plot:
my_dic = {'stats': {'apr': 23083904,
'may': 16786816,
'june': 26197936,
}}
my_df = pd.DataFrame(my_dic)
my_df.head()
This is how I plot it:
ax = my_df['stats'].plot(kind='bar', legend=False)
ax.set_xlabel("Month", fontsize=12)
ax.set_ylabel("Stats", fontsize=12)
ax.ticklabel_format(useOffset=False) #AttributeError: This method only works with the ScalarFormatter.
plt.show()
The plot:
I'd like to control the scientific notation. I tried to suppress it by this line as was suggested in other questions plt.ticklabel_format(useOffset=False) but I get this error back - AttributeError: This method only works with the ScalarFormatter. Ideally, I'd like to show my data in (mln).

Adding this line helps to get numbers in a plain format but with ',' which looks much nicer:
ax.get_yaxis().set_major_formatter(
matplotlib.ticker.FuncFormatter(lambda x, p: format(int(x), ',')))
And then I can use int(x)/ to convert to million or thousand as I wish:

Since you already using pandas
import matplotlib.pyplot as plt
my_df.plot(kind='bar')
plt.ticklabel_format(style='plain', axis='y')

Related

How to plot graph from file using Python, problem of the junction of lines

I'm new to python and have a question. I have a file.csv file that contains two columns.
FILE.csv
0.0000 9.0655
0.0048 9.0640
0.0096 9.0592
0.0144 9.0510
0.0192 9.0392
0.0240 9.0233
0.0288 9.0028
0.0336 8.9770
0.0384 8.9451
0.0432 8.9063
0.0480 8.8595
0.0528 8.8039
0.0576 8.7385
0.0624 8.6626
0.0000 11.0013
0.0048 11.0018
0.0096 11.0032
0.0144 11.0057
0.0192 11.0091
0.0240 11.0134
0.0288 11.0186
0.0336 11.0247
0.0384 11.0317
0.0432 11.0394
0.0480 11.0478
0.0528 11.0569
0.0576 11.0666
0.0624 11.0767
0.0672 11.0873
I tried to plot the graph from FILE.csv
with xmgrace and Gnuplot, and the result is very convincing.
I have two lines in the graph, as shown in the two figure below:
enter image description here
enter image description here
On the other hand, if I use my python script, the two lines are joined
here is my script:
import matplotlib.pyplot as plt
import pylab as plt
#
with open('bb.gnu') as f:
f=[x.strip() for x in f if x.strip()]
data=[tuple(map(float,x.split())) for x in f[2:]]
BX1=[x[0] for x in data]
BY1=[x[1] for x in data]
plt.figure(figsize=(8,6))
ax = plt.subplot(111)
plt.plot(BX1, BY1, 'k-', linewidth=2 ,label='Dos')
plt.plot()
plt.savefig("Fig.png", dpi=100)
plt.show()
And here's the result
enter image description here
My question, does it exist a solution to plot graph with Python, without generating the junction between the two lines.
In order to find a similar result to Gnuplot and xmgrace.
Thank you in advance for your help.
To my knowledge, matplotlib is only joining your two curves because you provide them as one set of data. This means that you need to call plot twice in order to generate two curves. I put your data in a file called data.csv and wrote the following piece of code:
import numpy
import matplotlib.pyplot as plt
data = numpy.genfromtxt('data.csv')
starts = numpy.asarray(data[:, 0] == 0).nonzero()[0]
fig, ax = plt.subplots(nrows=1, ncols=1, num=0, figsize=(16, 8))
for i in range(starts.shape[0]):
if i == starts.shape[0] - 1:
ax.plot(data[starts[i]:, 0], data[starts[i]:, 1])
else:
ax.plot(data[starts[i]:starts[i + 1], 0],
data[starts[i]:starts[i + 1], 1])
plt.show()
which generates this figure
What I do with starts is that I look for the rows in the first column of data which contain the value 0, which I consider to be the start of a new curve. The loop then generates a curve at each iteration. The if statement discerns between the last curve and the other ones. There is probably more elegant, but it works.
Also, do not import pylab, it is discouraged because of the unnecessary filling of the namespace.

How can I save histogram plot in python?

I have following code that generates a histogram. How can I save the histogram automatically using the code? I tried what we do for other plot types but that did not work for histogram.a is a 'numpy.ndarray'.
a = [-0.86906864 -0.72122614 -0.18074998 -0.57190212 -0.25689268 -1.
0.68713553 0.29597819 0.45022949 0.37550592 0.86906864 0.17437203
0.48704826 0.2235648 0.72122614 0.14387731 0.94194514 ]
fig = pl.hist(a,normed=0)
pl.title('Mean')
pl.xlabel("value")
pl.ylabel("Frequency")
pl.savefig("abc.png")
This works for me:
import matplotlib.pyplot as pl
import numpy as np
a = np.array([-0.86906864, -0.72122614, -0.18074998, -0.57190212, -0.25689268 ,-1. ,0.68713553 ,0.29597819, 0.45022949, 0.37550592, 0.86906864, 0.17437203, 0.48704826, 0.2235648, 0.72122614, 0.14387731, 0.94194514])
fig = pl.hist(a,normed=0)
pl.title('Mean')
pl.xlabel("value")
pl.ylabel("Frequency")
pl.savefig("abc.png")
a in the OP is not a numpy array and its format also needs to be modified (it needs commas, not spaces as delimiters). This program successfully saves the histogram in the working directory. If it still does not work, supply it with a full path to the location where you want to save it like this
pl.savefig("/Users/atru/abc.png")
The pl.show() statement should not be placed before savefig() as it creates a new figure which makes savefig() save a blank figure instead of the desired one as explained in this post.

Graphing multiple data sets using function to extract data from dictionary (matplotlib)

I would like to plot select data from a dictionary of the following format:
dictdata = {key_A: [(1,2),(1,3)]; key_B: [(3,2),(2,3)]; key_C: [(4,2),(1,4)]}
I am using the following function to extract data corresponding to a specific key and then separate the x and y values into two lists which can be plotted.
def plot_dictdata(ax1, key):
data = list()
data.append(dictdata[key])
for list_of_points in data:
for point in list_of_points:
x = point[0]
y = point[1]
ax1.scatter(x,y)
I'd like to be able to call this function multiple times (see code below) and have all relevant sets of data appear on the same graph. However, the final plot only shows the last set of data. How can I graph all sets of data on the same graph without clearing the previous set of data?
fig, ax1 = plt.subplots()
plot_dictdata(ax1, "key_A")
plot_dictdata(ax1, "key_B")
plot_dictdata(ax1, "key_C")
plt.show()
I have only just started using matplotlib, and wasn't able to figure out a solution using the following examples discussing related problems. Thank you in advance.
how to add a plot on top of another plot in matplotlib?
How to draw multiple line graph by using matplotlib in Python
Plotting a continuous stream of data with MatPlotLib
It could be that the problem is at a different point than you think it to be. The reason you only get the last point plotted is that in each loop step x and y are getting reassigned, such that at the end of the loop, each of them contain a single value.
As a solution you might want to use a list to append the values to, like
import matplotlib.pyplot as plt
dictdata = {"key_A": [(1,2),(1,3)], "key_B": [(3,2),(2,3)], "key_C": [(4,2),(1,4)]}
def plot_dictdata(ax1, key):
data = list()
data.append(dictdata[key])
x=[];y=[]
for list_of_points in data:
for point in list_of_points:
x.append(point[0])
y.append(point[1])
ax1.scatter(x,y)
fig, ax1 = plt.subplots()
plot_dictdata(ax1, "key_A")
plot_dictdata(ax1, "key_B")
plot_dictdata(ax1, "key_C")
plt.show()
resulting in
It would be worth noting that the plot_dictdata function could be simplified a lot, giving the same result as the above:
def plot_dictdata(ax1, key):
x,y = zip(*dictdata[key])
ax1.scatter(x,y)

Displaying timestamp in textarea using matplotlib

i have a map that i plotted using matplotlib from a csv file that I read using pandas, i need to display the date of my data in a textearea so i am doing this:
Start =data.index.max()
End = data.index.min()
txt = 'Date debut:',End,'Date fin:',Start
props1 = dict(boxstyle='round', facecolor='wheat', alpha=0.5)
ax.text(0.17, 0.17, txt, transform=ax.transAxes, fontsize=8, bbox=props1, family = 'monospace')
plt.show()
And i got this results :
As you can see it's not a really satisfying result, so i need to adjust the text written on the map to the right bottom out of the map,insert a space between date début et date fin and finally hide the'timestamp' from the textarea and leave only the dates, how can I proceed ?
The text can be positionned using the first two arguments; just replace the numbers 0.17 by something else. In this respect it may help to use ha and va (horizontal and vertical alignment) and set them such that the coordinates can be easily chosen (e.g. ha="right" makes sense when specifying coordinates at the right side of the plot). Note that you may well chose negative values if that makes sense to you.
To format the string nicely you first want to convert the Timestamp to a string. This is done using the strftime method. As argument you specify a formatting sting, e.g. "%d %b %Y" for day month year format. A complete set of formatting option can of course be found in the python documentation.
A complete example may be:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
d = pd.date_range("2017-01-01","2017-06-30",freq="D" )
x = np.random.rand(len(d))
data = pd.DataFrame(x, index=d)
fig, ax = plt.subplots()
start = data.index.min().strftime("%d %b %Y")
end = data.index.max().strftime("%d %b %Y")
txt = "Date debut: {}, date fin: {}".format(start, end)
props1 = dict(boxstyle='round', facecolor='wheat', alpha=0.5)
ax.text(0.98, 0.03, txt, transform=ax.transAxes, fontsize=8, bbox=props1,
family = 'monospace', ha="right", va="bottom")
plt.show()

Remove weekends in finance plots with volume overlay [duplicate]

I've been having some difficulty with Matplotlib's finance charting. It seems like their candlestick charts work best with daily data, and I am having a hard time making them work with intraday (every 5 minutes, between 9:30 and 4 pm) data.
I have pasted sample data in pastebin. The top is what I get from the database, and the bottom is tupled with the date formatted into an ordinal float for use in Matplotlib.
Link to sample data
When I draw my charts there are huge gaps in it, the axes suck, and the zoom is equally horrible. http://imgur.com/y7O8A
How do I make a nice readable graph out of this data? My ultimate goal is to get a chart that looks remotely like this:
http://i.imgur.com/EnrTW.jpg
The data points can be in various increments from 5 minutes to 30 minutes.
I have also made a Pandas dataframe of the data, but I am not sure if pandas has candlestick functionality.
If I understand well, one of your major concern is the gaps between the daily data.
To get rid of them, one method is to artificially 'evenly space' your data (but of course you will loose any temporal indication intra-day).
Anyways, doing this way, you will be able to obtain a chart that looks like the one you have proposed as an example.
The commented code and the resulting graph are below.
import numpy as np
import matplotlib.pyplot as plt
import datetime
from matplotlib.finance import candlestick
from matplotlib.dates import num2date
# data in a text file, 5 columns: time, opening, close, high, low
# note that I'm using the time you formated into an ordinal float
data = np.loadtxt('finance-data.txt', delimiter=',')
# determine number of days and create a list of those days
ndays = np.unique(np.trunc(data[:,0]), return_index=True)
xdays = []
for n in np.arange(len(ndays[0])):
xdays.append(datetime.date.isoformat(num2date(data[ndays[1],0][n])))
# creation of new data by replacing the time array with equally spaced values.
# this will allow to remove the gap between the days, when plotting the data
data2 = np.hstack([np.arange(data[:,0].size)[:, np.newaxis], data[:,1:]])
# plot the data
fig = plt.figure(figsize=(10, 5))
ax = fig.add_axes([0.1, 0.2, 0.85, 0.7])
# customization of the axis
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')
ax.tick_params(axis='both', direction='out', width=2, length=8,
labelsize=12, pad=8)
ax.spines['left'].set_linewidth(2)
ax.spines['bottom'].set_linewidth(2)
# set the ticks of the x axis only when starting a new day
ax.set_xticks(data2[ndays[1],0])
ax.set_xticklabels(xdays, rotation=45, horizontalalignment='right')
ax.set_ylabel('Quote ($)', size=20)
ax.set_ylim([177, 196])
candlestick(ax, data2, width=0.5, colorup='g', colordown='r')
plt.show()
I got tired of matplotlib's (and plotly's) bad performance and lack of such features you request, so implemented one of my own. Here's how that works:
import finplot as fplt
import yfinance
df = yfinance.download('AAPL')
fplt.candlestick_ochl(df[['Open', 'Close', 'High', 'Low']])
fplt.show()
Not only are days in which the exchange is closed left out automatically, but also has better performance and a nicer api. For something that more resembles what you're ultimately looking for:
import finplot as fplt
import yfinance
symbol = 'AAPL'
df = yfinance.download(symbol)
ax = fplt.create_plot(symbol)
fplt.candlestick_ochl(df[['Open', 'Close', 'High', 'Low']], ax=ax)
fplt.plot(df['Close'].rolling(200).mean(), ax=ax, legend='SMA 200')
fplt.plot(df['Close'].rolling(50).mean(), ax=ax, legend='SMA 50')
fplt.plot(df['Close'].rolling(20).mean(), ax=ax, legend='SMA 20')
fplt.volume_ocv(df[['Open', 'Close', 'Volume']], ax=ax.overlay())
fplt.show()