How to create a serial from column values to calculate the slope in Google Sheets? - if-statement

I have the following spreadsheet:
https://docs.google.com/spreadsheets/d/1Ib2Do3htfRg3NAuI-HyRA3MBM1XwUviFcAxlvF7q1J0/edit?usp=sharing
I have created 2 sparklines, 1 works, 1 doesn't. The one that does not work references the second column as the x-axis to calculate the slope. The slope is needed to give the graph some nice trending color.
My question is, how can I convert the second column into a serial [1, 2, 3, 4, 5]? So that when it is put as the x-axis, the slope would be calculated correctly. Of course, this conversion needs to happen within the formula itself. Thanks for any help.

try:
=ARRAYFORMULA(SPARKLINE(C2:C, {
"charttype", "line";
"color", IF(SLOPE(C2:C, ROW(B2:B)-1)>0, "lime", "red");
"linewidth", 2}))

Related

numpy.transpose for producing mirror images

I am familiar with numpy.transpose command that it is used to swap axes. But I am not familiar with mirror images that what they are and how numpy.transpose command is used to generate mirror image. The following link says that when we swap last two axis we get mirror images. So what is meant by mirror images here. I will be really thankful if someone please explain this with some picture
`a= np.arange(2*2*4).reshape(2,2,4)
b= np.transpose(a,(1,0,2))`
please look https://imgur.com/gallery/v6z7ah0
https://www.reddit.com/r/learnpython/comments/734lcl/complicated_numpy_transpose_question/?st=jij0av7a&sh=754dfd45
In [54]: a= np.arange(2*3*4).reshape(3,2,4)
# | | |
# axes 0 1 2
# new shape by moving the axes
In [54]: b= np.transpose(a,(1,0,2))
In [55]: a.shape
Out[55]: (3, 2, 4)
# first two axes are swapped
In [56]: b.shape
Out[56]: (2, 3, 4)
By default, np.transpose() reverses the shape. But, when passing an argument to np.transpose() the array is reshaped to the requested shape if possible.
Explanation:
In the above example, np.transpose(a, (1, 0, 2)) means that in the returned array b, the zeroth and first axes would be swapped.
Specifically, the tuple that's passed to np.transpose() is the order in which we want our resultant array to have the shape.
Plotting the image before (left) and after transposing (right):

Jaccard similarity in python

I am trying to find the jaccard similarity between two documents. However, i am having hard time to understand how the function sklearn.metrics.jaccard_similarity_score() works behind the scene.As per my understanding the Jaccard's sim = intersection of the terms in docs/ union of the terms in docs.
Consider below example:
My DTM for the two documents is:
array([[1, 1, 1, 1, 2, 0, 1, 0],
[2, 1, 1, 0, 1, 1, 0, 1]], dtype=int64)
above func. give me the jaccard sim score
print(sklearn.metrics.jaccard_similarity_score(tf_matrix[0,:],tf_matrix[1,:]))
0.25
I am trying to find the score on my own as :
intersection of terms in both the docs = 4
total terms in doc 1 = 6
total terms in doc 2 = 6
Jaccard = 4/(6+6-4)= .5
Can someone please help me understand if there is something obvious i am missing here.
As stated here:
In binary and multiclass classification, the Jaccard similarity coefficient score is equal to the classification accuracy.
Therefore in your example it is calculating the proportion of matching elements. That's why you're getting 0.25 as the result.
According to me
intersection of terms in both the docs = 2.
peek to peek intersection according to their respective index. As we need to predict correct value for our model.
Normal Intersection = 4. Leaving the order of index.
# so,
jaccard_score = 2/(6+6-4) = 0.25

subplots only plotting 1 plot using pandas

I am trying to get two plots on one figure using matplotlib's subplots() command. I want the two plots to share an x-axis and have one legend for the whole plot. The code I have right now is:
observline = mlines.Line2D([], [], color=(1,0.502,0),\
markersize=15, label='Observed',linewidth=2)
wrfline=mlines.Line2D([], [], color='black',\
markersize=15, label='WRF',linewidth=2)
fig,axes=plt.subplots(2,1,sharex='col',figsize=(18,10))
df08.plot(ax=axes[0],linewidth=2, color=(1,0.502,0))\
.legend(handles=[observline,wrfline],loc='lower center', bbox_to_anchor=(0.9315, 0.9598),prop={'size':16})
axes[0].set_title('WRF Model Comparison Near %.2f,%.2f' %(lat,lon),fontsize=24)
axes[0].set_ylim(0,360)
axes[0].set_yticks(np.arange(0,361,60))
df18.plot(ax=axes[1],linewidth=2, color='black').legend_.remove()
plt.subplots_adjust(hspace=0)
axes[1].set_ylim(0,360)
axes[1].set_yticks(np.arange(0,361,60))
plt.ylabel('Wind Direction [Degrees]',fontsize=18,color='black')
axes[1].yaxis.set_label_coords(-0.05, 1)
plt.xlabel('Time',fontsize=18,color='black')
#plt.savefig(df8graphfile, dpi = 72)
plt.show()
and it produces four figures, each with two subplots. The top is always empty. The bottom is filled for three of them with my 2nd dataframe. The indices for each dataframe is a datetimeindex in the format YYYY-mm-DD HH:MM:SS. The data is values from 0-360 nearly randomly across the whole time series, which is for two months.
Here is an example of each figure produced:

Plot non present numbers with pandas

I have a large pandas Series, which contains unique numbers from 0 to 1,000,000. The series is not complete, but lacks some numbers in this range. I want to get a rough idea of what numbers are missing, so I'm thinking I should plot the data as a line with gaps showing the missing data.
How would I accomplish that? This does not work:
nums = pd.Series(myNumbers)
nums.plot()
The following provides a list of the missing numbers in Series nums. You can then plot them as needed. For your purposes adjust the max to 1E6.
max = 10 # highest number to look for in the Series
import pandas as pd
nums = pd.Series([1, 2, 3, 4, 5, 6, 9])
missing = [n for n in xrange(int(max + 1)) if n not in nums.values]
print missing
# prints: [0, 7, 8, 10]
I think there are two concerns with the plotting function you wrote. First, there are one million numbers. Second, the x-axis for the plot will be indexes in the series (start at 0, going sequentially); the y-axis will be numbers that you care about (nums.values in the code here). Therefore, you are looking for missing y-axis values.
I think it depends on what you mean by missing. If those are nans, then you can do something like
len(nums[nums.apply(numpy.isnan)])
if you are looking for numbers that are not present between 0-1M in the series, then do something like
a= set([i for i in xrange(int(1e6))])
b= set(nums.values)
print len(a-b) # or plot it as scatter.

How to plot a scatter diagram using rpy2 in python?

I have a dataset like below in dictionary format,
data={'a': [10, 11,12,5,4,3,1], 'b': [7, 18,5,11,9,2,0]}
How we can make a scatter plot in python using rpy2? where x axis is the months and y axis are the mutiples of 5? we need to plot the graph with the above values where a and b are the data points
Months should be based on the length of each key i.e for the above data we have 7 months since we have 7 data points
This is a pretty involved data structure, and it's not completely clear what you're looking to do in terms of plotting. Here are a few hints, but it'd be easiest to help you if you would post the code you've tried but hasn't worked.
The R plot function takes two vectors corresponding to the x-axis values (months, here), and y-axis values (frequencies?). You'll want to go through your graph_data dictionary and calculate the y-axis values you want to plot for each month, and then make a corresponding vector for x containing the month numbers. For example:
x = [1,2,3,4]
y = [0.7, 0.9, 0.2, 0.4]
To do the plotting from rpy2, you'll need to convert the lists to vectors like so:
from rpy2 import robjects
x_vector = robjects.IntVector(x)
y_vector = robjects.FloatVector(y)
Then do the plotting:
robjects.r.plot(x_vector, y_vector, xlab="month", ylab="freq", main="")