Problem while implementing the correlation coefficient - powerbi

I'm new to PowerBi/powerquery and I was trying to write a function that calculates the correlation coefficient of 2 given lists.
I used the formula in the following image :
[![enter image description here][1]][1]
let
Function = (l1 as list , l2 as list) =>
let
CorCoefNumerator = List.Sum((l1 - List.Average(l1)) * (l2 -
List.Average(l2))),
Denominator1 = List.Sum(Number.Power(l1 - List.Average(l1), 2)),
Denominator2 = List.Sum(Number.Power(l2 - List.Average(l2), 2)),
CorCoefDenominator = Number.Sqrt(Denominator1 - Denominator2),
CorCoef = Value.Divide(CorCoefNumerator, CorCoefDenominator)
in
CorCoef
in
Function(Table.ToList([Sales]), Table.ToList([Profit]))
```
The Error message I'm getting is :
An error occurred in the ‘’ query. Expression.Error: There is an unknown identifier. Did you use the [field] shorthand for a _[field] outside of an 'each' expression?
One more question : Is there a way to use DAX function while writing power query queries ? becaus when first I tried to compute this correlation coefficient I needed it to work on columns, but since I couldn't use the DAX functions I had to use my columns as Lists !
[1]: https://i.stack.imgur.com/5z3uJ.png

Related

PowerQuery: How to replace text with each column name for multiple columns

I'm trying to replace "x" in each column (excepts for the first 2 columns) with the column name in a table with an unknown number of columns but with at least 2 columns.
I found the code to change one column, but I want it to be dynamic:
#"Ersatt värde" = Table.ReplaceValue(Källa,"x", Table.ColumnNames(Källa){2},Replacer.ReplaceText,{Table.ColumnNames(Källa){2}})
Any ideas on how to solve it?
If I understand correctly, I think you can try either approach below:
#"Ersatt värde" =
let
columnsToTransform = List.Skip(Table.ColumnNames(Källa), 2),
accumulated = List.Accumulate(columnsToTransform, Källa, (tableState as table, columnName as text) =>
Table.ReplaceValue(tableState,"x", columnName, Replacer.ReplaceText, {columnName})
)
in accumulated
or:
#"Ersatt värde" =
let
columnsToTransform = List.Skip(Table.ColumnNames(Källa), 2),
transformations = List.Transform(columnsToTransform, (columnName) => {columnName, each
Replacer.ReplaceText(Text.From(_), "x", columnName)}),
transformed = Table.TransformColumns(Källa, transformations)
in transformed,
Both ways follow a similar approach:
Figure out which columns to do replacements in (i.e. all except the first 2 columns)
Loop over columns determined in previous step and actually do the replacement.
I've used Replacer.ReplaceText since that's what you'd used in your question, but I believe this will replace both partial matches and full matches.
If you only want full matches to be replaced, I think you can use Replacer.ReplaceValue instead.

POWER BI: Creating a calculated field based on multiple filters

I am fairly new to Power Bi and have searched in various online help forums but was unsuccessful in finding the one similar to mine. Hence posting this here. Not sure if this is a fairly straightforward one or complicated (as I think!)
I have 3 columns: 'Event', 'Screen' and 'Time' (Similar to below)
I want do a single calculation as below:
(2*count of "NameSubmitted" occurrences in Event) - (AVG Time of corresponding "NameSubmitted" (from Event) * count of "NameSubmitted" occurrences in Event)
+
(2*count of "AddressAdded" occurrences in Event) - (AVG Time of corresponding "AddAddress" (from screen) * count of "AddressAdded" occurrences in Event)
+
(2*count of "OrderCreated" occurrences in Event) - (AVG Time of corresponding sum of "Orders"+"OrderDetail"+"OrderConfirmation" (from screen) * count of "OrderCreated" occurrences in Event)`
My approach:
I have tried to create a new column with the following IF() function calculation but in vain (Started like below) and been receiving the following error:
Calc =
CALCULATE(
(2*SUM(IF(Table[Event] = "NameSubmitted",1,BLANK())))
- AVERAGE(IF(Table[Event] = "NameSubmitted")
)) .....
Error: The SUM function only accepts a column reference as an argument
Any help is much appreciated.
I'm not going to give a complete answer, but the last piece should look something like this:
= 2 * COUNTROWS(FILTER(Table1, Table1[Event] = "OrderCreated"))
- CALCULATE(AVERAGE(Table1[Time]),
Table1[Screen] IN {"Orders", "OrderDetail", "OrderConfirmation"})
* COUNTROWS(FILTER(Table1, Table1[Event] = "OrderCreated"))
Note: It's more efficient if you factor out the COUNTROWS():
= COUNTROWS(FILTER(Table1, Table1[Event] = "OrderCreated")) *
(2 - CALCULATE(AVERAGE(Table1[Time]),
Table1[Screen] IN {"Orders", "OrderDetail", "OrderConfirmation"}))
You should be able to get the rest from there.
There are a variety of ways to count occurrences instead of using COUNTROWS(). The closest to your attempt would probably be like this:
= SUMX(Table1, IF(Table1[Event]="NameSubmitted", 1, BLANK()))

Reading Values from a Bag of Tuples in pig

I have UDF output as :-
Sample records:-
({(Todd,1),(Todd,1),(Todd,1),(Todd,1),(Todd,1),(Todd,5),(Todd,10),(Todd,20),(Todd,10),(Todd,10),(Todd,10),(Todd,10),(Todd,10),(Todd,10)})
({(Jon,1),(Jon,1),(Jon,1),(Jon,1),(Jon,1),(Jon,5),(Jon,10),(Jon,20),(Jon,10),(Jon,10),(Jon,10),(Jon,10),(Jon,5),(Jon,20),(Jon,1)})
Schema for UDF:- name:chararray(1 single column)
Now i want to read this bag of tuples and generate output as :-
Todd,240
Jon,422
The output of the UDF i stored in a temp file and read it back using different schema as:-
D = LOAD '/home/training/pig/pig/UDFdata.txt' AS (B: bag {T: tuple(name:chararray, denom:int)});
After that i am trying to use foreach loop and reference dot notation to find the sum.
X = foreach D generate B.T.name,SUM(B.T.denom);
2017-03-04 13:52:59,507 ERROR org.apache.pig.tools.grunt.Grunt: ERROR
1128: Cannot find field T in name:chararray,denom:int Details at
logfile: /home/training/pig_1488648405070.log
Can you please let me know how to find it? I am new to Apache Pig so not sure how it traverse in Bag of Tuples and find sum.
GROUP the dataset on name before performing SUM.
FLATTEN the bag to perform GROUP.
flattened = FOREACH D GENERATE FLATTEN(B);
dump flattened;
...
(Todd,10)
(Todd,10)
(Jon,1)
(Jon,1)
....
Then, GROUP them on name
grouped = GROUP flattened by name;
dump grouped;
(Jon,{(Jon,1),(Jon,20),(Jon,5),(Jon,10),(Jon,10),(Jon,10),(Jon,10),(Jon,20),(Jon,10),(Jon,5),(Jon,1),(Jon,1),(Jon,1),(Jon,1),(Jon,1)})
(Todd,{(Todd,10),(Todd,10),(Todd,10),(Todd,10),(Todd,10),(Todd,10),(Todd,20),(Todd,10),(Todd,5),(Todd,1),(Todd,1),(Todd,1),(Todd,1),(Todd,1)})
And apply SUM() over the result
final_sum = FOREACH grouped GENERATE group, SUM(flattened.denom);
dump final_sum;
(Jon,106)
(Todd,100)

referencing data within the groups of a list in F#

I performed my first query with a grouping function using the following code:
let dc = new TypedDataContext()
open MathNet.Numerics.Statistics
let newData = query { for x in dc.MyData do
where (x.ID = "number of type string")
groupBy x.Code into g
let average = query { for x in g do
averageBy x.Billed_Amt }
select (g, average) }
|> Seq.toList
System.Console.WriteLine(newData)
I am now wanting to calculate the standard deviation of the billed amounts in each group. However, when I try to reference the column, 'Billed_Amt,' like so
let sd = newData.Billed_Amt.StandardDeviation()
I receive the following error: "constructor or member 'Billed_Amt' is not defined."
I also tried newData.g.Billed_Amt.StandardDeviation(), in case I needed to reference the groups first, but I got the same error message referring to 'g'.
How do I overcome this?
I will say that I notice that the 'list' I created with the query claims to have 63 items. These would be the different group keys, not the rows of data themselves.
I feel like the research I have done online to solve this problem has just sent me in circles. Most of the resources online do not dumb it down enough for newbies like me. Any help would be appreciated.
Thank you!

XSB prolog: Problems with lists

I'm new to XSB prolog and I'm trying to solve this problem.
I've got prices of products and some orders. It looks like this:
price(cola,3).
price(juice,1).
price(sprite,4).
// product for ex. is cola with a price of 3 (something, it doesn't matter which currency)
order(1, [cola,cola,sprite]).
order(2, [sprite,sprite,juice]).
order(3, [juice,cola]). // the number here is the number of the order
// and the list represents the products that
// belong to that order
Now, my task is to write a new function called bill/2. This function should take the number of the order and then sum up all the prices for the products in the same order(list).
Something like:
|?-bill(1,R).
R= 10 ==> ((cola)3 + (cola)3 + (sprite)4 = 10)
|?-bill(2,R).
R= 9 ==> ((sprite)4 + (sprite4 + (juice)1 = 9)
and so on... I know how to get to the number of the order but I don't know how to get each product from the list inside that order to get to it's price, so I can sum it up.
Thanks in advance.
In plain Prolog, first get all numbers in a list, then sum the list:
bill(Ord, Tot) :-
order(Ord, Items),
findall(Price, (member(I, Items), price(I, Price)), Prices),
sum_list(Prices, Tot).
but since XSB has tabling available, there could be a better way, using some aggregation function.