Integrating an array in scipy with bounds. - python-2.7

I am trying to integrate over an array of data, but with bounds. Therfore I planned to use simps (scipy.integrate.simps). Because simps itself does not support bounds I decided to feed it only the selection of my data I want to integrate over. Yet this leads to strange results which are twice as big as the expected outcome.
What am I doing wrong, or what am I missing, or missunderstanding?
# -*- coding: utf-8 -*-
from scipy import integrate
from scipy import interpolate
import numpy as np
import matplotlib.pyplot as plt
# my data
x = np.linspace(-10, 10, 30)
y = x**2
# but I only want to integrate from 3 to 5
f = interpolate.interp1d(x, y)
x_selection = np.linspace(3, 5, 10)
y_selection = f(x_selection)
# quad returns the expected result
print 'quad', integrate.quad(f, 3, 5), '<- the expected value (includig error estimation)'
# but simps returns an uexpected result, when using the selected data
print 'simps', integrate.simps(x_selection, y_selection), '<- twice as big'
print 'trapz', integrate.trapz(x_selection, y_selection), '<- also twice as big'
plt.plot(x, y, marker='.')
plt.fill_between(x, y, 0, alpha=0.5)
plt.plot(x_selection, y_selection, marker='.')
plt.fill_between(x_selection, y_selection, 0, alpha=0.5)
plt.show()
Windows7, python2.7, scipy1.0.0

The Arguments for simps() and trapz() are in the wrong order.
You have flipped the calling arguments; simps and trapz expect first the y dimension, and second the x dimension, as per the docs. Once you have corrected this, similar results should obtain. Note that your example function admits a trivial analytic antiderivative, which would be much cheaper to evaluate.
– N. Wouda

Related

How to plot graph from file using Python, problem of the junction of lines

I'm new to python and have a question. I have a file.csv file that contains two columns.
FILE.csv
0.0000 9.0655
0.0048 9.0640
0.0096 9.0592
0.0144 9.0510
0.0192 9.0392
0.0240 9.0233
0.0288 9.0028
0.0336 8.9770
0.0384 8.9451
0.0432 8.9063
0.0480 8.8595
0.0528 8.8039
0.0576 8.7385
0.0624 8.6626
0.0000 11.0013
0.0048 11.0018
0.0096 11.0032
0.0144 11.0057
0.0192 11.0091
0.0240 11.0134
0.0288 11.0186
0.0336 11.0247
0.0384 11.0317
0.0432 11.0394
0.0480 11.0478
0.0528 11.0569
0.0576 11.0666
0.0624 11.0767
0.0672 11.0873
I tried to plot the graph from FILE.csv
with xmgrace and Gnuplot, and the result is very convincing.
I have two lines in the graph, as shown in the two figure below:
enter image description here
enter image description here
On the other hand, if I use my python script, the two lines are joined
here is my script:
import matplotlib.pyplot as plt
import pylab as plt
#
with open('bb.gnu') as f:
f=[x.strip() for x in f if x.strip()]
data=[tuple(map(float,x.split())) for x in f[2:]]
BX1=[x[0] for x in data]
BY1=[x[1] for x in data]
plt.figure(figsize=(8,6))
ax = plt.subplot(111)
plt.plot(BX1, BY1, 'k-', linewidth=2 ,label='Dos')
plt.plot()
plt.savefig("Fig.png", dpi=100)
plt.show()
And here's the result
enter image description here
My question, does it exist a solution to plot graph with Python, without generating the junction between the two lines.
In order to find a similar result to Gnuplot and xmgrace.
Thank you in advance for your help.
To my knowledge, matplotlib is only joining your two curves because you provide them as one set of data. This means that you need to call plot twice in order to generate two curves. I put your data in a file called data.csv and wrote the following piece of code:
import numpy
import matplotlib.pyplot as plt
data = numpy.genfromtxt('data.csv')
starts = numpy.asarray(data[:, 0] == 0).nonzero()[0]
fig, ax = plt.subplots(nrows=1, ncols=1, num=0, figsize=(16, 8))
for i in range(starts.shape[0]):
if i == starts.shape[0] - 1:
ax.plot(data[starts[i]:, 0], data[starts[i]:, 1])
else:
ax.plot(data[starts[i]:starts[i + 1], 0],
data[starts[i]:starts[i + 1], 1])
plt.show()
which generates this figure
What I do with starts is that I look for the rows in the first column of data which contain the value 0, which I consider to be the start of a new curve. The loop then generates a curve at each iteration. The if statement discerns between the last curve and the other ones. There is probably more elegant, but it works.
Also, do not import pylab, it is discouraged because of the unnecessary filling of the namespace.

How can I remove the negative sign from y tick labels in matplotlib.pyplot figure?

I am using the matplotlib.pylot module to generate thousands of figures that all deal with a value called "Total Vertical Depth(TVD)". The data that these values come from are all negative numbers but the industry standard is to display them as positive (I.E. distance from zero / absolute value). My y axis is used to display the numbers and of course uses the actual value (negative) to label the axis ticks. I do not want to change the values, but am wondering how to access the text elements and just remove the negative symbols from each value(shown in red circles on the image).
Several iterations of code after diving into the matplotlib documentation has gotten me to the following code, but I am still getting an error.
locs, labels = plt.yticks()
newLabels = []
for lbl in labels:
newLabels.append((lbl[0], lbl[1], str(float(str(lbl[2])) * -1)))
plt.yticks(locs, newLabels)
It appears that some of the strings in the "labels" list are empty and therefore the cast isn't working correctly, but I don't understand how it has any empty values if the yticks() method is retrieving the current tick configuration.
#SiHA points out that if we change the data then the order of labels on the y-axis will be reversed. So we can use a ticker formatter to just change the labels without changing the data as shown in the example below:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import numpy as np
#ticker.FuncFormatter
def major_formatter(x, pos):
label = str(-x) if x < 0 else str(x)
return label
y = np.linspace(-3000,-1000,2001)
fig, ax = plt.subplots()
ax.plot(y)
ax.yaxis.set_major_formatter(major_formatter)
plt.show()
This gives me the following plot, notice the order of y-axis labels.
Edit:
based on the Amit's great answer, here's the solution if you want to edit the data instead of the tick formatter:
import matplotlib.pyplot as plt
import numpy as np
y = np.linspace(-3000,-1000,2001)
fig, ax = plt.subplots()
ax.plot(-y) # invert y-values of the data
ax.invert_yaxis() # invert the axis so that larger values are displayed at the bottom
plt.show()

Python: plot different kinds of colors [duplicate]

I am using matplotlib to create the plots. I have to identify each plot with a different color which should be automatically generated by Python.
Can you please give me a method to put different colors for different plots in the same figure?
Matplotlib does this by default.
E.g.:
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(10)
plt.plot(x, x)
plt.plot(x, 2 * x)
plt.plot(x, 3 * x)
plt.plot(x, 4 * x)
plt.show()
And, as you may already know, you can easily add a legend:
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(10)
plt.plot(x, x)
plt.plot(x, 2 * x)
plt.plot(x, 3 * x)
plt.plot(x, 4 * x)
plt.legend(['y = x', 'y = 2x', 'y = 3x', 'y = 4x'], loc='upper left')
plt.show()
If you want to control the colors that will be cycled through:
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(10)
plt.gca().set_color_cycle(['red', 'green', 'blue', 'yellow'])
plt.plot(x, x)
plt.plot(x, 2 * x)
plt.plot(x, 3 * x)
plt.plot(x, 4 * x)
plt.legend(['y = x', 'y = 2x', 'y = 3x', 'y = 4x'], loc='upper left')
plt.show()
If you're unfamiliar with matplotlib, the tutorial is a good place to start.
Edit:
First off, if you have a lot (>5) of things you want to plot on one figure, either:
Put them on different plots (consider using a few subplots on one figure), or
Use something other than color (i.e. marker styles or line thickness) to distinguish between them.
Otherwise, you're going to wind up with a very messy plot! Be nice to who ever is going to read whatever you're doing and don't try to cram 15 different things onto one figure!!
Beyond that, many people are colorblind to varying degrees, and distinguishing between numerous subtly different colors is difficult for more people than you may realize.
That having been said, if you really want to put 20 lines on one axis with 20 relatively distinct colors, here's one way to do it:
import matplotlib.pyplot as plt
import numpy as np
num_plots = 20
# Have a look at the colormaps here and decide which one you'd like:
# http://matplotlib.org/1.2.1/examples/pylab_examples/show_colormaps.html
colormap = plt.cm.gist_ncar
plt.gca().set_prop_cycle(plt.cycler('color', plt.cm.jet(np.linspace(0, 1, num_plots))))
# Plot several different functions...
x = np.arange(10)
labels = []
for i in range(1, num_plots + 1):
plt.plot(x, i * x + 5 * i)
labels.append(r'$y = %ix + %i$' % (i, 5*i))
# I'm basically just demonstrating several different legend options here...
plt.legend(labels, ncol=4, loc='upper center',
bbox_to_anchor=[0.5, 1.1],
columnspacing=1.0, labelspacing=0.0,
handletextpad=0.0, handlelength=1.5,
fancybox=True, shadow=True)
plt.show()
Setting them later
If you don't know the number of the plots you are going to plot you can change the colours once you have plotted them retrieving the number directly from the plot using .lines, I use this solution:
Some random data
import matplotlib.pyplot as plt
import numpy as np
fig1 = plt.figure()
ax1 = fig1.add_subplot(111)
for i in range(1,15):
ax1.plot(np.array([1,5])*i,label=i)
The piece of code that you need:
colormap = plt.cm.gist_ncar #nipy_spectral, Set1,Paired
colors = [colormap(i) for i in np.linspace(0, 1,len(ax1.lines))]
for i,j in enumerate(ax1.lines):
j.set_color(colors[i])
ax1.legend(loc=2)
The result is the following:
TL;DR No, it can't be done automatically. Yes, it is possible.
import matplotlib.pyplot as plt
my_colors = plt.rcParams['axes.prop_cycle']() # <<< note that we CALL the prop_cycle
fig, axes = plt.subplots(2,3)
for ax in axes.flatten(): ax.plot((0,1), (0,1), **next(my_colors))
Each plot (axes) in a figure (figure) has its own cycle of colors — if you don't force a different color for each plot, all the plots share the same order of colors but, if we stretch a bit what "automatically" means, it can be done.
The OP wrote
[...] I have to identify each plot with a different color which should be automatically generated by [Matplotlib].
But... Matplotlib automatically generates different colors for each different curve
In [10]: import numpy as np
...: import matplotlib.pyplot as plt
In [11]: plt.plot((0,1), (0,1), (1,2), (1,0));
Out[11]:
So why the OP request? If we continue to read, we have
Can you please give me a method to put different colors for different plots in the same figure?
and it make sense, because each plot (each axes in Matplotlib's parlance) has its own color_cycle (or rather, in 2018, its prop_cycle) and each plot (axes) reuses the same colors in the same order.
In [12]: fig, axes = plt.subplots(2,3)
In [13]: for ax in axes.flatten():
...: ax.plot((0,1), (0,1))
If this is the meaning of the original question, one possibility is to explicitly name a different color for each plot.
If the plots (as it often happens) are generated in a loop we must have an additional loop variable to override the color automatically chosen by Matplotlib.
In [14]: fig, axes = plt.subplots(2,3)
In [15]: for ax, short_color_name in zip(axes.flatten(), 'brgkyc'):
...: ax.plot((0,1), (0,1), short_color_name)
Another possibility is to instantiate a cycler object
from cycler import cycler
my_cycler = cycler('color', ['k', 'r']) * cycler('linewidth', [1., 1.5, 2.])
actual_cycler = my_cycler()
fig, axes = plt.subplots(2,3)
for ax in axes.flat:
ax.plot((0,1), (0,1), **next(actual_cycler))
Note that type(my_cycler) is cycler.Cycler but type(actual_cycler) is itertools.cycle.
I would like to offer a minor improvement on the last loop answer given in the previous post (that post is correct and should still be accepted). The implicit assumption made when labeling the last example is that plt.label(LIST) puts label number X in LIST with the line corresponding to the Xth time plot was called. I have run into problems with this approach before. The recommended way to build legends and customize their labels per matplotlibs documentation ( http://matplotlib.org/users/legend_guide.html#adjusting-the-order-of-legend-item) is to have a warm feeling that the labels go along with the exact plots you think they do:
...
# Plot several different functions...
labels = []
plotHandles = []
for i in range(1, num_plots + 1):
x, = plt.plot(some x vector, some y vector) #need the ',' per ** below
plotHandles.append(x)
labels.append(some label)
plt.legend(plotHandles, labels, 'upper left',ncol=1)
**: Matplotlib Legends not working
Matplot colors your plot with different colors , but incase you wanna put specific colors
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(10)
plt.plot(x, x)
plt.plot(x, 2 * x,color='blue')
plt.plot(x, 3 * x,color='red')
plt.plot(x, 4 * x,color='green')
plt.show()
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
from skspatial.objects import Line, Vector
for count in range(0,len(LineList),1):
Line_Color = np.random.rand(3,)
Line(StartPoint,EndPoint)).plot_3d(ax,c="Line"+str(count),label="Line"+str(count))
plt.legend(loc='lower left')
plt.show(block=True)
The above code might help you to add 3D lines with different colours in a randomized fashion. Your colored lines can also be referenced with a help of a legend as mentioned in the label="... " parameter.
Honestly, my favourite way to do this is pretty simple: Now this won't work for an arbitrarily large number of plots, but it will do you up to 1163. This is by using the map of all matplotlib's named colours and then selecting them at random.
from random import choice
import matplotlib.pyplot as plt
from matplotlib.colors import mcolors
# Get full named colour map from matplotlib
colours = mcolors._colors_full_map # This is a dictionary of all named colours
# Turn the dictionary into a list
color_lst = list(colours.values())
# Plot using these random colours
for n, plot in enumerate(plots):
plt.scatter(plot[x], plot[y], color=choice(color_lst), label=n)

Fitting The Theoretical Equation To My Data

I am very, very new to python, so please bear with me, and pardon my naivety. I am using Spyder Python 2.7 on my Windows laptop. As the title suggests, I have some data, a theoretical equation, and I am attempting to fit my data, with what I believe is the Chi-squared fit. The theoretical equation I am using is
import math
import numpy as np
import scipy.optimize as optimize
import matplotlib.pylab as plt
import csv
#with open('1.csv', 'r') as datafile:
# datareader = csv.reader(datafile)
# for row in datareader:
# print ', '.join(row)
t_y_data = np.loadtxt('exerciseball.csv', dtype=float, delimiter=',', usecols=(1,4), skiprows = 1)
print(t_y_data)
t = t_y_data[:,0]
y = t_y_data[:,1]
gamma0 = [.1]
sigma = [(0.345366)/2]*(len(t))
#len(sigma)
#print(sigma)
#print(len(sigma))
#sigma is the error in our measurements, which is the radius of the object
# Dragfunction is the theoretical equation of the position as a function of time when the thing falling experiences a drag force
# This is the function we are trying to fit to our data
# t is the independent variable time, m is the mass, and D is the Diameter
#Gamma is the value of which python will vary, until chi-squared is a minimum
def Dragfunction(x, gamma):
print x
g = 9.8
D = 0.345366
m = 0.715
# num = math.sqrt(gamma)*D*g*x
# den = math.sqrt(m*g)
# frac = num/den
# print "frac", frac
return ((m)/(gamma*D**2))*math.log(math.cosh(math.sqrt(gamma/m*g)*D*g*t))
optimize.curve_fit(Dragfunction, t, y, gamma0, sigma)
This is the error message I am getting:
return ((m)/(gamma*D**2))*math.log(math.cosh(math.sqrt(gamma/m*g)*D*g*t))
TypeError: only length-1 arrays can be converted to Python scalars
My professor and I have spent about three or four hours trying to fix this. He helped me work out a lot of the problems, but this we can't seem to resolve.
Could someone please help? If there is any other information you need, please let me know.
Your error message comes from the fact that those math functions only accept a scalar, so to call functions on an array, use the numpy versions:
In [82]: a = np.array([1,2,3])
In [83]: np.sqrt(a)
Out[83]: array([ 1. , 1.41421356, 1.73205081])
In [84]: math.sqrt(a)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
----> 1 math.sqrt(a)
TypeError: only length-1 arrays can be converted to Python scalars
In the process, I happened to spot a mathematical error in your code. Your equation at top says that g is in the bottom of the square root inside the log(cosh()), but you've got it on the top because a/b*c == a*c/b in python, not a/(b*c)
log(cosh(sqrt(gamma/m*g)*D*g*t))
should instead be any one of these:
log(cosh(sqrt(gamma/m/g)*D*g*t))
log(cosh(sqrt(gamma/(m*g))*D*g*t))
log(cosh(sqrt(gamma*g/m)*D*t)) # the simplest, by canceling with the g from outside sqrt
A second error is that in your function definition, you have the parameter named x which you never use, but instead you're using t which at this point is a global variable (from your data), so you won't see an error. You won't see an effect using curve_fit since it will pass your t data to the function anyway, but if you tried to call the Dragfunction on a different data set, it would still give you the results from the t values. Probably you meant this:
def Dragfunction(t, gamma):
print t
...
return ... D*g*t ...
A couple other notes as unsolicited advice, since you said you were new to python:
You can load and "unpack" the t and y variables at once with:
t, y = np.loadtxt('exerciseball.csv', dtype=float, delimiter=',', usecols=(1,4), skiprows = 1, unpack=True)
If your error is constant, then sigma has no effect on curve_fit, as it only affects the relative weighting for the fit, so you really don't need it at all.
Below is my version of your code, with all of the above changes in place.
import numpy as np
from scipy import optimize # simplified syntax
import matplotlib.pyplot as plt # pylab != pyplot
# `unpack` lets you split the columns immediately:
t, y = np.loadtxt('exerciseball.csv', dtype=float, delimiter=',',
usecols=(1, 4), skiprows=1, unpack=True)
gamma0 = .1 # does not need to be a list
def Dragfunction(x, gamma):
g = 9.8
D = 0.345366
m = 0.715
gammaD_m = gamma*D*D/m # combination is used twice, only calculate once for (small) speedup
return np.log(np.cosh(np.sqrt(gammaD_m*g)*t)) / gammaD_m
gamma_best, gamma_var = optimize.curve_fit(Dragfunction, t, y, gamma0)

How to set offset with matplotlib

I'm trying to remove the offset that matplotlib automatically put on my graphs. For example, with the following code:
x=np.array([1., 2., 3.])
y=2.*x*1.e7
MyFig = plt.figure()
MyAx = MyFig.add_subplot(111)
MyAx.plot(x,y)
I obtain the following result (sorry, I cannot post image): the y-axis have the ticks 2, 2.5, 3, ..., 6, with a unique "x10^7" at the top of the y axis.
I would like to remove the "x10^7" from the top of the axis, and making it appearing with each tick (2x10^7, 2.5x10^7, etc...). If I understood well what I saw in other topics, I have to play with the use_Offset variable. So I tried the following thing:
MyFormatter = MyAx.axes.yaxis.get_major_formatter()
MyFormatter.useOffset(False)
MyAx.axes.yaxis.set_major_formatter(MyFormatter)
without any success (result unchanged).
Am I doing something wrong? How can I change this behaviour? Or have I to manually set the ticks ?
Thanks by advance for any help !
You can use the FuncFormatter from the ticker module to format the ticklabels as you please:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.ticker import FuncFormatter
x=np.array([1., 2., 3.])
y=2.*x*1.e7
MyFig = plt.figure()
MyAx = MyFig.add_subplot(111)
def sci_notation(x, pos):
return "${:.1f} \\times 10^{{6}}$".format(x / 1.e7)
MyFormatter = FuncFormatter(sci_notation)
MyAx.axes.yaxis.set_major_formatter(MyFormatter)
MyAx.plot(x,y)
plt.show()
On a side note; the "x10^7" value that appears at the top of your axis is not an offset, but a factor used in scientific notation. This behavior can be disabled by calling MyFormatter.use_scientific(False). Numbers will then be displayed as decimals.
An offset is a value you have to add (or subtract) to the tickvalues rather than multiply with, as the latter is a scale.
For reference, the line
MyFormatter.useOffset(False)
should be
MyFormatter.set_useOffset(False)
as the first one is a bool (can only have the values True or False), which means it can not be called as a method. The latter is the method used to enable/disable the offset.