Data format for gnuplot plot - fortran

I have data file coming from a fortran code. The data are 2 arrays v and np of size 500 and a scalar time.
Each time I append to a file the new time value and the 2 vectors in 2 new lines of the file, in this format:
time, v(1), v(2), v(3), ..., v(499), v(500)
time, np(1), np(2), np(3), ..., np(499), np(500)
For example:
0.0, 1.0, 2.0, 3.0, ..., 499.0, 500.0
0.0, 0.1, 0.2, 0.3, ..., 0.499, 0.500
1.0, 1.0, 2.0, 3.0, ..., 499.0, 500.0
1.0, 0.1, 0.2, 0.3, ..., 0.499, 0.500
2.0, 1.0, 2.0, 3.0, ..., 499.0, 500.0
2.0, 0.1, 0.2, 0.3, ..., 0.499, 0.500
What i want, is to plot np as a function of v at an specific time (So in this case if i want time=2 i will plot lines 5 and 6 with ignoring the first row). However gnuplot doesn't like this format. I made it work using python, however i must do it with gnuplot.
I searched online and found that i can output my data in another format but this doesn't work properly either. This format looks like this:
0.0 0.0
1.0 0.1
2.0 0.2
3.0 0.3
4.0 0.4
... ...
499.0 0.499
500.0 0.500
1.0 1.0
1.0 0.1
2.0 0.2
3.0 0.3
4.0 0.4
... ...
499.0 0.499
500.0 0.500
2.0 2.0
1.0 0.1
2.0 0.2
3.0 0.3
4.0 0.4
... ...
499.0 0.499
500.0 0.500
This format plots everything including the time and even using for loops and every function it doesn't work.
I also searched if I could format my data in columns in fortran, however i couldn't find any solution to that. The problem i have is that at each time, the arrays v and np are erased from memory. And for reasons I can't explain, I can't save v and np in a matrix and save them for later.
Is there a way i can format my data on fortran to read it on gnuplot and be able to plot only one time ?
Or is there a way i can read this format using only gnuplot ?

gnuplot doesn't like data in rows.
You could transpose your data with an external tool and simply plot it as columns with gnuplot. Unfortunately, gnuplot has no transpose function itself, although, in principle you could also transpose with gnuplot (https://stackoverflow.com/a/65576405/7295599), but it's probably highly inefficient for large datasets.
Actually, here is an awkward gnuplot solution to plot data from two rows.
The values of the rows of interest are stored in arrays (hence requires gnuplot >=5.2.0) using a dummy table. The option every ::SkipCols skips the first SkipCols columns. In your case SkipCols=1 which skips the time values.
Maybe somebody can simplify this approach further.
Code:
### plotting a row versus another row (requires gnuplot >=5.2.0)
reset session
$Data <<EOD
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7
1.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7
1.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7
2.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7
2.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7
EOD
myXrow = 2 # counting starts from 0
myYrow = 3 #
SkipCols = 1 # skip number of columns from the beginning
stats $Data u 0 nooutput # get the number of columns
array X[STATS_columns-SkipCols] # define array size
array Y[STATS_columns-SkipCols] #
myX(row) = ($2==row ? X[$1-SkipCols+1]=$3 : NaN)
myY(row) = ($2==row ? Y[$1-SkipCols+1]=$3 : NaN)
# put the x,y rows into arrays
set table $Dummy
plot $Data matrix u (myX(myXrow),myY(myYrow)) every :myYrow-myXrow:SkipCols:myXrow::myXrow+myYrow w table
unset table
undef $Dummy
set key noautotitle
plot X u 2:(Y[$1]) w lp pt 7
### end of code
Result:
Addition: (Version for gnuplot 5.0)
Here is a version for gnuplot 5.0. Although datablocks have been introduced in gnuplot 5.0 you cannot address them via index as you can do in gnuplot 5.2. So, this workaround uses strings to store the rows and then printing it back into a datablock.
Not very elegant and probably not efficient, but it seems to work.
Unless there would be a limit of the string length, it should also work for your 500 columns. Actually, you have comma as separator, so you have set datafile separator comma and later set datafile separator whitespace. The code can probably still be optimized.
Code: (Result same as above)
### plotting a row versus another row (working for gnuplot 5.0)
reset session
$Data <<EOD
0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7
0.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7
1.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7
1.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7
2.0, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7
2.0, 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7
EOD
myXrow = 2 # counting starts from 0
myYrow = 3
set datafile separator comma
X = Y = ''
AddValue(S,row) = S.($2==row ? sprintf(" %g",$3) : '')
set table $Dummy
plot $Data matrix u (X=AddValue(X,myXrow),Y=AddValue(Y,myYrow)) every :myYrow-myXrow:1:myXrow::myXrow+myYrow
unset table
undef $Dummy
set print $DataNew
do for [i=1:words(X)] { print sprintf("%s %s",word(X,i),word(Y,i)) }
set print
set datafile separator whitespace
set key noautotitle
plot $DataNew u 1:2 w lp pt 7
### end of code

Related

pyplot - yticks data representation help - need to convert in KB/MB

I am trying to plot graph for throughput numbers.
my data is x axis = time in epoch, y = throughput in bytes.
I have y-ticks as
print loc, labels
[ 0. 5000000. 10000000. 15000000. 20000000. 25000000.
30000000. 35000000.]<a list of 8 Text yticklabel objects>
I want to show this data in KB or MB. Please help on how I can go about it?
I am lost and stuck. Currently the data on y starts with 0 -> 3.5 (1e7) which in itself does not make sense about throughput.
So y ticks are - 0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5 with 1e7
Appreciate help!

numpy arange for floating point start & stop

I want numpy to create a full list, given the parameters start, stop and increment, but ran into some troubles:
In[2]: import numpy as np
In[3]: np.arange(2.0, 2.4, 0.2)
Out[3]: array([ 2. , 2.2])
In[4]: np.arange(2.0, 2.6, 0.2)
Out[4]: array([ 2. , 2.2, 2.4, 2.6])
In[5]: np.arange(2.0, 2.8, 0.2)
Out[5]: array([ 2. , 2.2, 2.4, 2.6])
What I actually want is:
array([ 2. , 2.2, 2.4])
Now, I've learned that I should avoid the floating point data type if it comes down to fixed values. I know it would be better to multiply start/stop/increment by 100, but the problem is that I cannot tell, how many decimals the user is going to supply. Is there any way I can still do that with Float or is there a better way to solve this?
Edit:
It works now with the obvious solution of adding 0.0000001 to the end-value. But this looks horrible in my code...I'd hope to fix this nicely somehow
Could you specify which values the user is supposed to enter? For that kind of generation, I think linspace could be better as it includes the end parameter
EDIT: if the user enters start, end, and increment, just use linspace with num = int((end-start)/increment+1) if the exact value of the increment is not critical.
EDIT2:
adapt 1e-4 to the relative error you deem acceptable (you can even add it as a user-defined variable).
eps = 1e-4*(stop-start)
num = int((stop-start)/(incr-eps)+1)
np.linspace(start, stop,num)
this might seem a little longer but if you are keen on using np.arange this is how I worked it out:
decimal_places = decimal.Decimal(str(STEP)).as_tuple().exponent
power_10_multiplier = 10**-decimal_places
MIN = int(MIN*power_10_multiplier)
MAX = int(MAX*power_10_multiplier)
STEP = int(STEP*power_10_multiplier)
arr = np.arange(MIN, MAX + STEP, step=STEP)/power_10_multiplier

Replicate IDL 'smooth' in Python 2.7

I have been trying to work out how to replicate IDL's smooth function in Python and I just can't get anything like the same results. (Disclaimer: It is probably 10 years since I touched this kind of mathematical problem so it has been dumped to make way for information like where to find the cheapest local fuel). I am trying to code this:
smooth(b,w,/nan)
where b is a 2D float array containing NANs (zeros - missing data - have also been converted to NAN).
From the IDL documents, it appears smooth uses a boxcar, so from scipy.ndimage.filters I have tried:
bsmooth = uniform_filter(b, w)
I am aware that there are some fundamental differences here:
the default edge behaviour from IDL is "the end points are copied
from the original array to the result with no smoothing" whereas I
don't seem to have the option to do this with the uniform filter.
Treatment of the NaN elements. In IDL, the /nan keyword seems to
mean that where possible the NaN values will be filled by the result
of the other points in the window. If there are no valid points to
generate a result, by a MISSING keyword. I thought I could
approximate this behaviour following the smoothing using
scipy.interpolate's NearestNDInterpolator (thanks to the brilliant
explanation by Alex on here:
filling gaps on an image using numpy and scipy)
Here is my test array:
>>>b array([[ 0.97599638, 0.93114936, 0.87070072, 0.5379253 ],
[ 0.34873217, nan, 0.40985891, 0.22407863],
[ nan, nan, nan, 0.67532134],
[ nan, nan, 0.85441768, nan]])
My answers bore not the SLIGHTEST resemblance to IDL, whether I use the /nan keyword or not.
IDL> smooth(b,2,/nan)
0.97599638 0.93114936 0.87070072 0.53792530
0.34873217 0.70728749 0.60817236 0.22407863
NaN 0.53766960 0.54091913 0.67532134
NaN NaN 0.85441768 NaN
IDL> smooth(b,2)
0.97599638 0.93114936 0.87070072 0.53792530
0.34873217 -NaN -NaN 0.22407863
-NaN -NaN -NaN 0.67532134
-NaN -NaN 0.85441768 NaN
I confess I find the scipy documentation rather sparse on detail so I have no idea if I am really doing what I think I doing. The fact that the two python approaches which I believed would both smooth the image give different answers suggests that things are not what I understood them to be.
>>>uniform_filter(b, 2)
array([[ 0.97599638, 0.95357287, 0.90092504, 0.70431301],
[ 0.66236428, nan, nan, nan],
[ nan, nan, nan, nan],
[ nan, nan, nan, nan]])
I thought it was a bit odd it was so empty so I tried this with an array of 100 elements (still using a window of 2) and output the images. The results (first image is 'b' second is 'bsmooth') are not quite what I was hoping for:
Going back to the smaller array and following the examples in: http://scipy.github.io/old-wiki/pages/Cookbook/SignalSmooth which I thought would give the same output as uniform_filter, I tried:
>>> box = np.array([1,1,1,1])
>>> box = box.reshape(2,2)
>>> box
array([[1, 1],
[1, 1]])
>>> bsmooth = scipy.signal.convolve2d(b,box,mode='same')
>>> print bsmooth
[[ 0.97599638 1.90714574 1.80185008 1.40862602]
[ 1.32472855 nan nan 2.04256356]
[ nan nan nan nan]
[ nan nan nan nan]]
Obviously I have completely misunderstood the scipy functions, maybe even the IDL one. If anyone can help me to replicate the IDL smooth function as closely as possible, I would be extremely grateful. I am under considerable time pressure to get a solution for this that doesn't rely on IDL and I am tossing a coin to decide whether to code the function from scratch or develop a very contagious illness.
How can I perform the same smoothing in python?
First: Please use matplotlib.pyplot.imshow with interpolation="none" that's nicer to look at and maybe with greyscale.
So for your example: There is actually no convolution (filter) within scipy and numpy that treat's NaN as missing values (they propagate them within the convolution). At least I've found none so far and your boundary-treatement is also (to my knowledge) not implemented. But the boundary could be just replaced afterwards.
If you want to do convolution with NaN you can for example use astropy.convolution.convolve. There NaNs are interpolated using the kernel of your filter. But their convolution has some drawbacks as well: Border handling like you want isn't implemented there neither and your kernel must be of odd shape and the sum of your kernel must not be zero (or very close to it)
For example:
from astropy.convolution import convolve
import numpy as np
array = np.random.uniform(10,100, (4,4))
array[1,1] = np.nan
kernel = np.ones((3,3))
convolve(array, kernel)
as an example an initial array of
array([[ 97.19514587, 62.36979751, 93.54811286, 30.23567842],
[ 51.02184613, nan, 46.14769821, 60.08088041],
[ 20.86482452, 42.39661484, 36.96961278, 96.89180175],
[ 45.54453509, 76.61274347, 46.44485141, 25.40985372]])
will become:
array([[ 266.9009961 , 406.59680717, 348.69637399, 230.01236989],
[ 330.16243546, 506.82785931, 524.95440336, 363.87378443],
[ 292.75477064, 422.31693304, 487.26826319, 311.94469828],
[ 185.41871792, 268.83318211, 324.72547798, 205.71611967]])
if you want to "normalize" it, astropy offers the normalize_kernel parameter:
convolved = convolve(array, kernel, normalize_kernel=True)
array([[ 29.58753936, 42.09982189, 49.31793529, 33.00203873],
[ 49.87040638, 65.67695002, 66.10447436, 40.44026448],
[ 52.51126383, 63.03914444, 60.85474739, 35.88011742],
[ 39.40188443, 46.82350749, 40.1380926 , 22.46090152]])
If you want to replace the "edge" values with the ones from the original array just replace them:
convolved[0,:] = array[0,:]
convolved[-1,:] = array[-1,:]
convolved[:,0] = array[:,0]
convolved[:,-1] = array[:,-1]
So that's what the existing packages offer (as far as I know it). If you want to learn a bit of Cython or numba you can easily write your own convolutions that is not much slower (only a factor of 2-10) than the numpy/scipy ones but does EXACTLY what you want without messing around.
Since this is not something that is available in the python packages and because I saw the question asked several times during my research without satisfactory answers, here is how I solved the issue.
Provided is a test version of my function that I'm off to tidy up. I am sure there will be better ways to do the things I have done as I'm still fairly new to Python - please do recommend any appropriate changes.
Plots use autumn colourmap just because it allowed me to see the NaNs clearly.
My results:
IDL propagate
0.033369284 0.067915268 0.96602046 0.85623550
0.30435592 NaN NaN 100.00000
0.94065958 NaN NaN 0.90966976
0.018516513 0.044460904 0.051047217 NaN
python propagate
[[ 3.33692829e-02 6.79152655e-02 9.66020487e-01 8.56235492e-01]
[ 3.04355923e-01 nan nan 1.00000000e+02]
[ 9.40659566e-01 nan nan 9.09669768e-01]
[ 1.85165123e-02 4.44609040e-02 5.10472165e-02 nan]]
IDL replace
0.033369284 0.067915268 0.96602046 0.85623550
0.30435592 0.47452110 14.829881 100.00000
0.94065958 0.33833817 17.002417 0.90966976
0.018516513 0.044460904 0.051047217 NaN
python replace
[[ 3.33692829e-02 6.79152655e-02 9.66020487e-01 8.56235492e-01]
[ 3.04355923e-01 4.74521092e-01 1.48298812e+01 1.00000000e+02]
[ 9.40659566e-01 3.38338177e-01 1.70024175e+01 9.09669768e-01]
[ 1.85165123e-02 4.44609040e-02 5.10472165e-02 nan]]
My function:
#!/usr/bin/env python
# smooth.py
__version__ = 0.1
# Version 0.1 29 Feb 2016 ELH Test release
import numpy as np
import matplotlib.pyplot as mp
def Smooth(v1, w, nanopt):
# v1 is the input 2D numpy array.
# w is the width of the square window along one dimension
# nanopt can be replace or propagate
'''
v1 = np.array(
[[3.33692829e-02, 6.79152655e-02, 9.66020487e-01, 8.56235492e-01],
[3.04355923e-01, np.nan , 4.86013025e-01, 1.00000000e+02],
[9.40659566e-01, 5.23314093e-01, np.nan , 9.09669768e-01],
[1.85165123e-02, 4.44609040e-02, 5.10472165e-02, np.nan ]])
w = 2
'''
mp.imshow(v1, interpolation='None', cmap='autumn')
mp.show()
# make a copy of the array for the output:
vout=np.copy(v1)
# If w is even, add one
if w % 2 == 0:
w = w + 1
# get the size of each dim of the input:
r,c = v1.shape
# Assume that w, the width of the window is always square.
startrc = (w - 1)/2
stopr = r - ((w + 1)/2) + 1
stopc = c - ((w + 1)/2) + 1
# For all pixels within the border defined by the box size, calculate the average in the window.
# There are two options:
# Ignore NaNs and replace the value where possible.
# Propagate the NaNs
for col in range(startrc,stopc):
# Calculate the window start and stop columns
startwc = col - (w/2)
stopwc = col + (w/2) + 1
for row in range (startrc,stopr):
# Calculate the window start and stop rows
startwr = row - (w/2)
stopwr = row + (w/2) + 1
# Extract the window
window = v1[startwr:stopwr, startwc:stopwc]
if nanopt == 'replace':
# If we're replacing Nans, then select only the finite elements
window = window[np.isfinite(window)]
# Calculate the mean of the window
vout[row,col] = np.mean(window)
mp.imshow(vout, interpolation='None', cmap='autumn')
mp.show()
return vout

How do I find many maximums in a list in Mathematica

I have some data that i'm trying to analyze and it has many cycles where it returns to a maximum value. I want to be able to select and pull out all of those maximum values and make a trend line to see if it has good durability.
my question is much like This Question but my segments are not uniform
The data is stored in a Tab delimited format {Timestamp,data,data,data,data,data,cycle#,boolean}
I've gotten it to be able to pull out each cycle with this code but how do i get the maximum at the same time?
#Importing the List#
SetDirectory[NotebookDirectory[]]
rawl = Import["SU8-20-50psi-6-29.txt", "TSV"];
date = {rawl[[4]][[1]]}
pressure = {rawl[[4]][[2]]};
forwardflow = {rawl[[4]][[3]]};
backwashflow = {rawl[[4]][[4]]};
forwardpressure = {rawl[[4]][[5]]};
backwashpressure = {rawl[[4]][[6]]};
cycles = {rawl[[4]][[7]]};
backwash = {rawl[[4]][[8]]};
length = Length[rawl]
iter = 4;
While[iter < length,
iter = iter + 1;
AppendTo[date, rawl[[iter]][[1]]];
AppendTo[pressure, rawl[[iter]][[2]]];
AppendTo[forwardflow, rawl[[iter]][[3]]];
AppendTo[backwashflow, rawl[[iter]][[4]]];
AppendTo[forwardpressure, rawl[[iter]][[5]]];
AppendTo[backwashpressure, rawl[[iter]][[6]]];
AppendTo[cycles, rawl[[iter]][[7]]];
AppendTo[backwash, rawl[[iter]][[8]]]]
Select[rawl, #[[]][[7]] == 1 &]
I'm looking for a maximum in the 3rd data point
here is a sample of the data file
2015-06-30 16:11:15.628563 0.5 0.7 0.0 11.1 41.2 0 False
2015-06-30 16:11:15.889830 0.9 0.3 0.0 7.7 42.6 0 False
2015-06-30 16:11:16.090567 1.5 0.6 0.0 5.3 43.2 0 True
2015-06-30 16:11:16.338970 1.4 1.0 0.0 7.2 43.2 0 True
2015-06-30 16:11:16.456993 1.4 1.4 0.0 9.6 43.2 0 True
2015-06-30 16:11:16.580034 1.4 1.0 0.0 11.6 43.7 0 True
2015-06-30 16:11:16.692873 1.5 1.0 0.0 13.7 43.7 0 True
2015-06-30 16:11:16.804827 1.5 0.6 0.0 15.0 43.6 1 False
2015-06-30 16:11:16.937007 1.6 0.4 0.0 15.7 43.7 1 True
2015-06-30 16:11:17.047861 1.6 0.0 0.0 15.8 43.6 1 True
2015-06-30 16:11:17.158619 1.6 0.0 0.0 15.8 43.7 1 True
2015-06-30 16:11:17.293030 1.5 0.0 0.0 15.7 43.9 1 True
2015-06-30 16:11:17.404268 1.5 0.0 0.0 15.7 44.0 1 True
2015-06-30 16:11:17.514991 1.5 0.0 0.0 15.6 44.8 1 True
2015-06-30 16:11:17.650058 1.5 0.0 0.0 15.7 44.7 1 True
2015-06-30 16:11:17.761827 1.5 0.0 0.0 15.7 44.7 1 True
2015-06-30 16:11:17.872931 1.8 0.0 0.0 15.7 44.1 2 False
2015-06-30 16:11:18.112676 0.4 0.0 0.0 15.0 42.4 2 False
<<< EDIT >>>> Here is my updated code that i have been trying but can't quite get it to work
groups = Split[rawl, #1[[7]] == #2[[7]] &]; (* this works great*)
group = Max[groups[[3]][[All, 3]]] (*This works too*)
Map[Max, groups[[#]][[All, 3]]] & (*So why wont these work?*)
Transpose[MapAt[Max /# # &, Transpose[groups], 3]]
Thank you for the sample data, that always helps.
I'm not certain I understand all your question, but perhaps this will let you explain what I am missing.
This
rawl = Import["psi.txt", "TSV"];
rawl = Drop[rawl, 3];(*drop unwanted header rows?*)
{date, pressure, forwardflow, backwashflow, forwardpressure,
backwashpressure, cycles, backwash} = Transpose[rawl]
extracts the columns and stores your data in your variables.
This
Select[rawl, #[[7]] == 1&]
extracts those rows with cycles==1.
This extracts a maximum forward flow
maxff = Max ## forwardflow
and this
Select[rawl, #[[3]] == maxff&]
finds all rows with forwardflow equal to that maximum.
If you can clarify what you need the final step to do then I'll try to finish this.
<<< EDIT >>>
Your latest description of the process is MUCH more helpful.
So here is my thinking:
1: You want to group items having the same number in the 7th column. Look up Split in the help pages. Look in the examples to see how to group on one particular column when you have rows of multiple columns. That will give you each group inside an extra layer of {} and all the groups will be inside a outer layer of {}.
2: Now you want to "do the same thing to every item", where each item is your group of rows from step 1. In Mathematica this is often best done with the Map function. Look that up in the help pages. The thing you want to do is the Max function and you need to see how to use subscripting to get the third column for that. This should give you the list of maxima. And Map will put all those results into a list for you so you get step 3 for free.
4: Then you can plot that resulting list.
So see if this gives you enough of an idea to get you started. If you can't figure out some part of the language then let me know and I'll give another hint.
Thanks to Bill this is what i came up with and it works like a charm and is a LOT faster than anything with While loops.
(*import the data*)
rawl = Import["SU8-MA7.txt", "TSV"];
rawl = Drop[rawl, 3];(*drop unwanted header rows*)
{date, pressure, forwardflow, backwashflow, forwardpressure, backwashpressure, cycles, backwash} = Transpose[rawl];
(*Split it into groups and get the maximum of each group*)
totalcycles = Max[cycles]
groups = Split[rawl, #1[[7]] == #2[[7]] &];
f[group_] := Max[groups[[group]][[All, 3]]]
maximum = Map[f, Range[1, totalcycles]];

Plot vertical arrows for my points

I am trying to figure out a way to add a vertical arrow pointing up for each of my data points. I have scatter plot and code below. I need the vertical arrows to start from the points going upwards to a length of about 0.2 in th graph scale.
import matplotlib.pyplot as plt
fig = plt.figure()
a1 = fig.add_subplot(111)
simbh = np.array([5.3, 5.3, 5.5, 5.6, 5.6, 5.8, 5.9, 6.0, 6.2, 6.3, 6.3])
simstel =np.array([10.02, 10.08, 9.64, 9.53, 9.78, 9.65, 10.05, 10.09, 10.08, 10.22, 10.42])
sca2=a1.scatter(simstel, simbh )
This is bit hacky, adjust arrow_offset and arrow_size until the figure looks right.
import matplotlib.pyplot as plt
fig = plt.figure()
a1 = fig.add_subplot(111)
simbh = np.array([5.3, 5.3, 5.5, 5.6, 5.6, 5.8, 5.9, 6.0, 6.2, 6.3, 6.3])
simstel =np.array([10.02, 10.08, 9.64, 9.53, 9.78, 9.65, 10.05, 10.09, 10.08, 10.22, 10.42])
sca2=a1.scatter(simstel, simbh, c='w' )
arrow_offset = 0.08
arrow_size = 500
sca2=a1.scatter(simstel, simbh + arrow_offset,
marker=r'$\uparrow$', s=arrow_size)
The other approaches presented are great. I'm going for the hackiest award today:
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
simbh = np.array([5.3, 5.3, 5.5, 5.6, 5.6, 5.8, 5.9, 6.0, 6.2, 6.3, 6.3])
simstel = np.array([10.02, 10.08, 9.64, 9.53, 9.78, 9.65, 10.05, 10.09, 10.08, 10.22, 10.42])
sca2 = ax.scatter(simstel, simbh)
for x, y in zip(simstel, simbh):
ax.annotate('', xy=(x, y), xytext=(0, 25), textcoords='offset points',
arrowprops=dict(arrowstyle="<|-"))
This is not super elegant, but it does the trick
to get the arrows start at the data point and go up 0.2 units:
for x,y in zip(simstel,simbh):
plt.arrow(x,y,0,0.2)
This can be done directly
from matplotlib import pyplot as plt
import numpy as np
# set up figure
fig, ax = plt.subplots()
# make synthetic data
x = np.linspace(0, 1, 15)
y = np.random.rand(15)
yerr = np.ones_like(x) * .2
# if you are using 1.3.1 or older you might need to use uplims to work
# around a bug, see below
ax.errorbar(x, y, yerr=yerr, lolims=True, ls='none', marker='o')
# adjust axis limits
ax.margins(.1) # margins makes the markers not overlap with the edges
There was some strangeness in how these arrows are implemented where the semantics changed so that 'lolims' means 'the data point is the lower limit' and 'uplims' means 'the data point is the maximum value'.
See https://github.com/matplotlib/matplotlib/pull/2452