How to calculate performance curve for each row of data

How to calculate performance curve for each row of data - powerbi

I want to plot a performance curve for each row of data I have.
A simple version of what I want to do is plot the function with the equation as Y= m*X+b, where I have a table with m and b values and I want Y values for X = 1 to 10.
How is this calculated?
A Y = mX + b example can be seen in the following plot:

The following works:
WITH NUMBERS AS
(
SELECT N FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10))N(N)
),
Examples AS
(
SELECT m,b FROM (VALUES (1,2),(2,2))N(m,b)
)
SELECT
'Y = ' + CAST(Examples.m as varchar(10)) + 'X + ' + CAST(Examples.b as varchar(10)) AS Formula
,Numbers.N AS X
, Numbers.N * Examples.m + Examples.b
FROM Examples
CROSS JOIN NUMBERS

Related

Power BI - Matching closest 3D points from two tables

I have two tables (Table 1 and Table 2) both containing thousands of three dimensional point coordinates (X, Y, Z), Table 2 also has an attribute column.
Table 1
X
Y
Z
6007
44268
1053
6020
44269
1051
Table 2
X
Y
Z
Attribute
6011
44310
1031
A
6049
44271
1112
B
I need to populate a calculated column in Table 1 with an attribute from Table 2 based on the minimum distance between points in 3D space. Basically, match the points in Table 1 to the closest point in Table 2 and then fetch the attribute from Table 2.
So far I have tried rounding X, Y and Z in both tables, then concatenating the rounded values into a separate column in each table. I then use DAX:
CALCULATE(FIRSTNONBLANK(Table 2 [Attribute],1),FILTER(ALL(Table2), Table 2[XYZ]=Table 1 [XYZ])).
This has given me reasonable success depending on the degree of rounding applied to the coordinates.
Is there a better way to achieve this in Power Bi?

This is similar to this post, except with a simpler distance function. See also this post.
Assuming you want the standard Euclidean Distance:
ClosestPointAttribute =
MINX (
TOPN (
1,
Table2,
( Table2[X] - Table1[X] ) ^ 2 +
( Table2[Y] - Table1[Y] ) ^ 2 +
( Table2[Z] - Table1[Z] ) ^ 2,
ASC
),
Table2[Attribute]
)
Note: I've omitted the SQRT from the formula because we don't need the actual distance, just the ordering (and SQRT preserves order since it's a strictly increasing function). You can include it if you prefer.

A function in M Code:
(p1 as list, q1 as list)=>
let
f = List.Generate(
()=> [x = Number.Power(p1{0}-q1{0},2), idx=0],
each [idx]<List.Count(p1),
each [x = Number.Power(p1{[idx]+1}-q1{[idx]+1},2), idx=[idx]+1],
each [x]
),
r = Number.Sqrt(List.Sum(f))
in
r
Each list is a set of coordinates and the function will return the distance between p and q
The above function (which I named fnDistance) can be incorporated into power query code as in this example:
let
//Read in both tables and set data types
Source2 =Excel.CurrentWorkbook(){[Name="Table_2"]}[Content],
table2 = Table.TransformColumnTypes(Source2,{{"X", Int64.Type}, {"Y", Int64.Type}, {"Z", Int64.Type},{"Attribute", Text.Type}}),
Source = Excel.CurrentWorkbook(){[Name="Table_1"]}[Content],
table1 = Table.TransformColumnTypes(Source,{{"X", Int64.Type}, {"Y", Int64.Type}, {"Z", Int64.Type}}),
//calculate distances from Table 1 coordinates to each of the Table 2 coordinates and store in a List
custom = Table.AddColumn(table1,"Distances", each
let
t2 = Table.ToRecords(table2),
X=[X],
Y=[Y],
Z=[Z],
distances = List.Generate(()=>
[d=fnDistance({X,Y,Z},{t2{0}[X],t2{0}[Y],t2{0}[Z]}),a=t2{0}[Attribute], idx=0],
each [idx] < List.Count(t2),
each [d=fnDistance({X,Y,Z},{t2{[idx]+1}[X],t2{[idx]+1}[Y],t2{[idx]+1}[Z]}),a=t2{[idx]+1}[Attribute], idx=[idx]+1],
each {[d],[a]}),
//determine set of coordinates with the minimum distance and return associate Attribute
minDistance = List.Min(List.Alternate(List.Combine(distances),1,1,1)),
attribute = List.Range(List.Combine(distances), List.PositionOf(List.Combine(distances),minDistance)+1,1){0}
in
attribute, Text.Type)
in
custom

Interpolation of two variables on a 3D grid

I need to interpolate two variables written in the 3D grid on another 3D grid. I tried the inverse distance method, but I get only two values that do not represent the distribution on the original grid, assigned to each point of the new grid. Here is an example of my code:
text=text[pstart:pend]
x=[]
y=[]
z=[]
for line in text:
coords=line.split()
x.append(float(coords[2])) #coordinates of the new grid
y.append(float(coords[1]))
z.append(float(coords[0]))
Xg=np.asarray([x,y,z])
# Gather mean flow data
xd=[]
yd=[]
zd=[]
cd=[]
rhod=[]
with open(meanflowdata,'rb') as csvfile:
spamreader=csv.reader(csvfile, delimiter=',')
for row in spamreader:
if len(row)>2:
xd.append(float(row[0])) #coordinates and values of the source file
yd.append(float(row[1]))
zd.append(float(row[2]))
cd.append(float(row[3]))
rhod.append(float(row[4]))
Xd=np.asarray([xd,yd,zd])
Zd=np.asarray([cd,rhod])
leafsize = 20
print "# setting up KDtree"
invdisttree = Invdisttree( Xd.T, Zd.T, leafsize=leafsize, stat=1 )
print "# Performing interpolation"
interpol = invdisttree( Xg.T )
c=interpol.T[0]
rho=interpol.T[1]
As far as I could check, the problem lies when I call the invdisttree function, which does not work properly. Does someone have an idea or an alternative method to suggest for the interpolation?

Where do interpol.T[0], interpol.T[1] come from,
where did your Invdisttree come from ?
This
on SO has
invdisttree = Invdisttree( X, z ) -- data points, values
interpol = invdisttree( q, nnear=3, eps=0, p=1, weights=None, stat=0 )
In your case X could be 100 x 3, z 100 x 2,
query points q 10 x 3 ⟶ interpol 10 x 2.
(invdisttree is a function, which you call to do the interpolation:
interpol = invdisttree( q ...) . Is that confusing ?)

Parametized SQL query on a loop not updating correctly

I have an sql query running on a loop. There are two values FINGER and index_str that both need to be updated in parallel.
FINGER: (numpy array)
[['1012_8']
['10214_5']
['10409_9']
index_str: (pandas dataframe)
0 14,38,51,65,84,85
1 3,34,58,65,66,75
2 3,15,68,70,80,82
Above are the first 3 examples. There are over 1000 of each in reality.
for i in range(len(FINGER)):
print i
print FINGER[i]
for x in index_str[i]:
yy = FINGER[i][0]
#print range(len(FINGER))
index_str = str(x)
query = "SELECT finger, ind, x,y, CAST( (direction*180/3.142)as INT),CAST(quality*100 as INT) from UNIL_fingerprints where finger = '" + yy + "' and ind IN (" + index_str + ") order by ind "
print query
c.execute(query)
rows = c.fetchall()
print rows
Above is the loop and query in question.
So far the loop runs through all values of index_str for only the first FINGER value. To elaborate, the query updates for the first 3 examples as follows.
SELECT finger, ind, x,y, CAST( (direction*180/3.142)as INT),CAST(quality*100 as INT) from UNIL_fingerprints where finger = '1012_8' and ind IN (14,38,51,65,84,85) order by ind
SELECT finger, ind, x,y, CAST( (direction*180/3.142)as INT),CAST(quality*100 as INT) from UNIL_fingerprints where finger = '1012_8' and ind IN (3,34,58,65,66,75) order by ind
SELECT finger, ind, x,y, CAST( (direction*180/3.142)as INT),CAST(quality*100 as INT) from UNIL_fingerprints where finger = '1012_8' and ind IN (3,15,68,70,80,82) order by ind
Whereas '1012_8' should be '10214_5' and '10409_9' respectively in the 2nd and 3rd query above.
Any ideas on how to get this to update properly would be helpful.

You want zip():
for finger, indexes in zip(FINGERS, index_str):
print("fingers : {}- indexes: {}".format(finger, indexes))
Also you REALLY want to learn and use the db-api properly (well, unless you dont mind being hacked, that is).

pyspark mathematical computation in a dataframe

I have extracted a Dataframe from a larger Dataframe, and now I need to do simple computation like addition and division in dataframe.
sample dataframe is like.
item counts
z 23156
x 15462
What I need to do is to divide x by sum of x and z
for example
value= x/x+z

You must compute the sum of x and first then divide x by sum(x) + sum(y)
for example:
Table 1(original table):
x z
1 2
3 4
Table 2 (Aggregated table):
table2 = sqlCtx.sql("select sum(x) + sum(z) as sum_xz")
table2.registerTempTable("table2")
sum_xz
10
Then join both table and divide
table3 = sqlCtx.sql("select a.x / bs.um_xz from table1 a join table2 b")
For your reference.

Piecewise linear regression with SAS PHREG

How to implement a piecewise linear regression model in PHREG procedure of SAS?
For example with one knot at X=T:
Y = β_10 + β_11 . X if X ≤ T
Y = β_20 + β_21 . X if X >T
Given the model with the constraint of continuity:
Y = β_10 + β_11 . X if X ≤ T
Y = β_10 + (β_11 - β_21) T + β_21 . X if X >T
i.e :
Y= β_0 + β_1 . X + S_1
where
S_1 = ( β_11 - β_21 ) T if X >T and 0 otherwise.
Finally i would like to include it in a Cox model:
Proc PHREG
Model time * cas (censure) = X S_1 ;
Run ;
But the problem is S_1 has unknown beta coefficients in it.
Thanks for your help!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to calculate performance curve for each row of data - powerbi

Related

Power BI - Matching closest 3D points from two tables

Interpolation of two variables on a 3D grid

Parametized SQL query on a loop not updating correctly

pyspark mathematical computation in a dataframe

Piecewise linear regression with SAS PHREG

Categories

Resources