Mysql GROUP_CONCAT incomplete, well under group_concat_max_len - casting

Issue is shown query below.
The table has a column value_mean which is a float.
The issue is the last GROUP_CONCAT, it cut off the last 2 characters. Why?
MariaDB [testdb]> SELECT
a.value_mean,
b.value_mean,
HEX(a.value_mean),
HEX(b.value_mean),
a.value_mean / b.value_mean,
a.value_mean / b.value_mean * 100 - 100,
CONCAT(a.value_mean / b.value_mean, 'AFTER'),
CONCAT(a.value_mean / b.value_mean * 100 - 100, 'AFTER'),
GROUP_CONCAT( CONCAT(a.value_mean / b.value_mean, 'AFTER') ),
GROUP_CONCAT( CONCAT(a.value_mean / b.value_mean * 100 - 100, 'AFTER') )
FROM
baseline_result AS a
JOIN baseline_result AS b
WHERE
a.id = 125716755
AND b.id=125717382
\G
*************************** 1. row ***************************
value_mean: 15141600000
value_mean: 15141600000
HEX(a.value_mean): 38681F000
HEX(b.value_mean): 38681F400
a.value_mean / b.value_mean: 0.9999999323715897
a.value_mean / b.value_mean * 100 - 100: -0.0000067628410249653825
CONCAT(a.value_mean / b.value_mean, 'AFTER'): 0.9999999323715897AFTER
CONCAT(a.value_mean / b.value_mean * 100 - 100, 'AFTER'): -0.0000067628410249653825AFTER
GROUP_CONCAT(CONCAT(a.value_mean / b.value_mean, 'AFTER')): 0.9999999323715897AFTER
GROUP_CONCAT(CONCAT(a.value_mean / b.value_mean * 100 - 100, 'AFTER')): -0.0000067628410249653825AFT
1 row in set (0.001 sec)
This shows it even better but harder to follow.. the SELECT will now actually have two rows to GROUP_CONCAT and can see the cutoff characters are in the middle of the GROUP_CONCAT:
MariaDB [testdb]> SELECT
a.value_mean,
b.value_mean,
HEX(a.value_mean),
HEX(b.value_mean),
a.value_mean / b.value_mean,
a.value_mean / b.value_mean * 100 - 100,
CONCAT(a.value_mean / b.value_mean, 'AFTER'),
CONCAT(a.value_mean / b.value_mean * 100 - 100, 'AFTER'),
GROUP_CONCAT(CONCAT(a.value_mean / b.value_mean, 'AFTER')),
GROUP_CONCAT(CONCAT(a.value_mean / b.value_mean * 100 - 100, 'AFTER'))
FROM
baseline_result AS a
JOIN baseline_result AS b
WHERE
a.id = 125716755
AND b.id in (125717382, 125717383)
\G
*************************** 1. row ***************************
value_mean: 15141600000
value_mean: 15141600000
HEX(a.value_mean): 38681F000
HEX(b.value_mean): 38681F400
a.value_mean / b.value_mean: 0.9999999323715897
a.value_mean / b.value_mean * 100 - 100: -0.0000067628410249653825
CONCAT(a.value_mean / b.value_mean, 'AFTER'): 0.9999999323715897AFTER
CONCAT(a.value_mean / b.value_mean * 100 - 100, 'AFTER'): -0.0000067628410249653825AFTER
GROUP_CONCAT(CONCAT(a.value_mean / b.value_mean, 'AFTER')): 0.9999999323715897AFTER,0.8918810968677036AFTER
GROUP_CONCAT(CONCAT(a.value_mean / b.value_mean * 100 - 100, 'AFTER')): -0.0000067628410249653825AFT,-10.81189031322964AFTER
1 row in set (0.001 sec)
[root#test ~]# mysql --version
mysql Ver 15.1 Distrib 5.5.56-MariaDB, for Linux (x86_64) using readline 5.1
I'm a bit stumped by this. If the float value are slightly different and issue is avoided. Just been getting lucky, I guess. Larger query failed, narrowed issue down to these two floats.. but now bit stumped.
I can avoid issue by rounding the values that are GROUP_CONCAT-ed but that isn't ideal solution.
More info.. the column is float, but just selecting value_mean its close enough it prints the same. Printing HEX() you can see value better.
MariaDB [testdb]> SELECT
id,
value_mean,
HEX(value_mean)
FROM baseline_result
WHERE id in (125716755,125717382,125717383);
+-----------+-------------+-----------------+
| id | value_mean | HEX(value_mean) |
+-----------+-------------+-----------------+
| 125716755 | 15141600000 | 38681F000 |
| 125717382 | 15141600000 | 38681F400 |
| 125717383 | 16977100000 | 3F3EA2800 |
+-----------+-------------+-----------------+
3 rows in set (0.000 sec)
MariaDB [testdb]> SELECT VERSION();
+----------------+
| VERSION() |
+----------------+
| 5.5.56-MariaDB |
+----------------+
1 row in set (0.000 sec)
Here is easier repro with more recent MariaDB:
MariaDB [(none)]> SELECT VERSION();
+---------------------------------------+
| VERSION() |
+---------------------------------------+
| 10.5.13-MariaDB-1:10.5.13+maria~focal |
+---------------------------------------+
1 row in set (0.000 sec)
MariaDB [(none)]> select GROUP_CONCAT(a) FROM ( SELECT CONCAT(conv('38681F000', 16, 10) / conv('38681F400', 16, 10) * 100 - 100, 'AFTER') as a) xxx;
+------------------------------+
| GROUP_CONCAT(a) |
+------------------------------+
| -0.0000067628410249653825AFT |
+------------------------------+
1 row in set (0.000 sec)

Related

Rolling average over time with multiple values per date

I'm trying to calculate a rolling average for each row of a table based on values present in this table based on a sliding time window looking ahead and back a certain amount of days.
Given the following table:
myTable
+------------+-------+
| Date | Value |
+------------+-------+
| 31/05/2020 | 5 |
+------------+-------+
| 31/05/2020 | 10 |
+------------+-------+
| 01/06/2020 | 50 |
+------------+-------+
| 01/08/2020 | 50 |
+------------+-------+
and the measure
myMeasure =
VAR LookAheadAndBehindInDays = 28
RETURN
AVERAGEX (
DATESINPERIOD (
myTable[Date],
DATEADD ( LASTDATE ( myTable[Date] ), LookAheadAndBehindInDays, DAY ),
-2 * LookAheadAndBehindInDays,
DAY
),
myTable[Value]
)
I checked that the DATESINPERIOD returns effectively the right dates. My problem lies in the calculation of the average.
Instead of calculating the average of all values directly (expected result)
+------------+-------+---------------------------+
| Date | Value | myMeasure |
+------------+-------+---------------------------+
| 31/05/2020 | 5 | (5 + 10 + 50) / 3 = 21.66 |
+------------+-------+---------------------------+
| 31/05/2020 | 10 | (5 + 10 + 50) / 3 = 21.66 |
+------------+-------+---------------------------+
| 01/06/2020 | 50 | (5 + 10 + 50) / 3 = 21.66 |
+------------+-------+---------------------------+
| 01/08/2020 | 27 | 27 / 1 = 27 |
+------------+-------+---------------------------+
It first calculates the average of each date, and then the average of those values:
+------------+-------+--------------------+------------------------+
| Date | Value | Avg. by Date | myMeasure |
+------------+-------+--------------------+------------------------+
| 31/05/2020 | 5 | (5 + 10) / 2 = 7.5 | (7.5 + 50) / 3 = 28.75 |
+------------+-------+--------------------+------------------------+
| 31/05/2020 | 10 | (5 + 10) / 2 = 7.5 | (7.5 + 50) / 3 = 28.75 |
+------------+-------+--------------------+------------------------+
| 01/06/2020 | 50 | 50 / 1 = 50 | (7.5 + 50) / 3 = 28.75 |
+------------+-------+--------------------+------------------------+
| 01/08/2020 | 27 | 27 / 1 = 27 | 27 / 1 = 27 |
+------------+-------+--------------------+------------------------+
I found out about this behavior by using this measure:
myMeasure DEBUG =
VAR LookAheadAndBehindInDays = 28
VAR vTable =
DATESINPERIOD (
myTable[Date],
DATEADD ( LASTDATE ( myTable[Date] ), LookAheadAndBehindInDays , DAY ),
-2 * LookAheadAndBehindInDays,
DAY
)
RETURN
FIRSTDATE ( vTable ) & " - " & LASTDATE ( vTable ) & UNICHAR(10)
& " - Row Count: " & COUNTROWS ( vTable ) & UNICHAR(10)
& " - Avg: " & AVERAGEX(vTable, myTable[Value]) & UNICHAR(10)
& " - Dates: " & CONCATENATEX ( vTable, myTable[Date], "," ) & UNICHAR(10)
& " - Values: " & CONCATENATEX ( vTable, myTable[Value], "," )
This returns for rows with the date '31/05/2020' and '31/05/2020' the following value:
31/05/2020 - 01/06/2020
Row Count: 2
Avg: 28.75
Dates: 31/05/2020,01/06/2020
Values: 7.5,50
Most notable are the Row Count 2, which I would expect to be 3 and the values 5,10 and 50 (as reflected above in the tables)
So my question is, how can in calculate the rolling average over time by weighting each value equally, instead of weighting each day equally.
I'm not sure I completely understood the problem, but to me you just need a standard AVERAGE and not the AVERAGEX iterator.
I've changed the formula a bit and didn't use DATESINPERIOD, this one achieves the same result and (to me) is more clear and readable
Avg =
VAR DaysInterval = 28
RETURN
CALCULATE (
AVERAGE ( myTable[Value] ),
DATESBETWEEN (
myTable[Date],
MAX ( myTable[Date] ) - DaysInterval, --from
MAX ( myTable[Date] ) + DaysInterval --to
)
)
here is the result (based on the sample dataset)
What you are looking for is the calculated average from the days -/+28:
myMeasure =
VAR LookAheadAndBehindInDays = 28
var curDAte = rolling[ Date]
return CALCULATE(AVERAGE(rolling[Value]),
FILTER(rolling,
rolling[ Date] +LookAheadAndBehindInDays >= curDAte &&
rolling[ Date] -LookAheadAndBehindInDays <= curDAte))
as you can see I am using the filter to get the rows falling in the date range and calculate the average over those.

How to sum up a measure based on different levels in Power BI using DAX

I have the following table structure:
| Name 1 | Name 2 | Month | Count 1 | Count 2 | SumCount |
|--------|--------|--------|---------|---------|----------|
| A | E | 1 | 5 | 3 | 8 |
| A | E | 2 | 1 | 6 | 7 |
| A | F | 3 | 3 | 4 | 7 |
Now I calculate the following with a DAX measure.
Measure = (sum(Table[Count 2] - sum(Table[Count 1])) * sum(Table[SumCount])
I can't use a column because then the formula is applied before excluding a layer (eg. month). Added to my table structure and excluded month it would look like that:
| Name 1 | Name 2 | Count 1 | Count 2 | SumCount | Measure |
|--------|--------|---------|---------|----------|---------|
| A | E | 6 | 9 | 15 | 45 |
| A | F | 3 | 4 | 7 | 7 |
I added a table to the view which only displays Name 1in which case the measure of course will sum up Count 1, Count 2 and SumCount and applies the measure which leads to the following result:
| Name 1 | Measure |
|--------|---------|
| A | 88 |
But the desired result should be
| Name 1 | Measure |
|--------|---------|
| A | 52 |
which is the sum of Measure.
So basically I want to have the calculation on my base level Measure = (sum(Table[Count 1] - sum(Table[Count 2])) * sum(Table[SumCount]) but when drilling up and grouping those names it should only perform a sum.
An iterator function like SUMX is what you want here since you are trying to sum row by row rather than aggregating first.
Measure = SUMX ( Table, ( Table[Count 2] - Table[Count 1] ) * Table[SumCount] )
Any filters you have will be applied to the first argument, Table, and it will only sum the corresponding rows.
Edit:
If I'm understanding correctly, you want to aggregate over Month before taking the difference and product. One way to do this is by summarizing (excluding Month) before using SUMX like this:
Measure =
VAR Summary =
SUMMARIZE (
Table,
Table[Name 1],
Table[Name 2],
"Count1Sum", SUM ( Table[Count 1] ),
"Count2Sum", SUM ( Table[Count 2] ),
"SumCountSum", SUM ( Table[SumCount] )
)
RETURN
SUMX ( Summary, ( [Count2Sum] - [Count1Sum] ) * [SumCountSum] )
You don't want measure in this case, rather you need new column,
Same formula but new column will give your desired result.
Column = ('Table (2)'[Count1]-'Table (2)'[Count2])*'Table (2)'[SumCount]

Power BI - weighted average yield across 2 tables of a given date

I would like to calculate average yield between two relation tables of a given date
Table1 Table2
+-------------------------------+ +-------------------------------+
| ID TradeDate Amount | | ID TradeDate Yield |
+-------------------------------+ +-------------------------------+
| 1 2018/11/30 100 | | 1 2018/11/8 2.2% |
| 1 2018/11/8 101 | | 1 2018/8/8 2.1% |
| 1 2018/10/31 102 | | 1 2018/5/8 2.0% |
| 1 2018/9/30 103 | | 2 2018/9/8 1.7% |
| 2 2018/11/30 200 | | 2 2018/6/8 1.6% |
| 2 2018/10/31 203 | | 2 2018/3/8 1.5% |
| 2 2018/9/30 205 | | 3 2018/10/20 1.7% |
| 3 2018/11/30 300 | | 3 2018/7/20 1.6% |
| 3 2018/10/31 300 | | 3 2018/4/20 1.6% |
| 3 2018/9/30 300 | +-------------------------------+
+-------------------------------+
I create a table named 'DateList' and use slicer to select a specified date.
Screen Shot DateList.
I want to achieve the following result:
as of *11/9/2018*
+-----------------------------------------------------------------+
| ID LastDate Value LatestYieldDate LastYield |
+-----------------------------------------------------------------+
| 1 2018/11/8 101 2018/11/8 2.2% |
| 2 2018/10/31 203 2018/9/8 1.7% |
| 3 2018/10/31 300 2018/10/20 1.7% |
+-----------------------------------------------------------------+
| Total 604 1.7836% |
+-----------------------------------------------------------------+
Currently, I use the following formula to achieve the partial result
Create 2 measures in table1
LastDate =
VAR SlicerDate = MIN(DateList[Date])
VAR MinDiff =
MINX(FILTER(ALL(Table1),Table1[ID] IN VALUES(Table1[ID])),
ABS(SlicerDate - Table1[TradeDate]))
RETURN
MINX(FILTER(ALL(Table1),Table1[ID] IN VALUES(Table1[ID])
&& ABS(SlicerDate - Table1[TradeDate]) = MinDiff),
Table1[TradeDate])
Value = CALCULATE(SUM(Table1[Amount]), FILTER(Table1, Table1[TradeDate] = [LastDate]))
Create 2 measures in table2
LastYieldDate =
VAR SlicerDate = MIN(DateList[Date])
VAR MinDiff =
MINX(FILTER(ALL(Table2),Table2[ID] IN VALUES(Table2[ID])),
ABS(SlicerDate - Table2[TradeDate]))
RETURN
MINX(FILTER(ALL(Table2),Table2[ID] IN VALUES(Table2[ID])
&& ABS(SlicerDate - Table2[TradeDate]) = MinDiff),
Table2[TradeDate])
LastYield = CALCULATE(SUM(Table2[Yield]), FILTER(Table2,
Table2[TradeDate] = [LastYieldDate]))
I have no idea to calculate right average yield between 2 tables
Here is my current result.
Screen Shot Current Result.
You'll first need to create a bridge table for the ID values so you can work with both tables more easily.
IDList = VALUES(Table1[ID])
Now we'll use IDList[ID] on our visual instead of the ID from one of the other tables.
The measure we use for the average last yield is a basic sum-product average:
LastYieldAvg =
DIVIDE(
SUMX(IDList, [Value] * [LastYield]),
SUMX(IDList, [Value])
)
Note that when there is only a single ID value, it simplifies to
[Value] * [LastYield] / [Value] = [LastYield]

Power BI - max date or nearest date of selected date

Object: Sum up the nearest date's value by a given date
Here is my data
Table: MyData
+-------------------------------+
| ID TradeDate Value |
+-------------------------------+
| 1 2018/11/30 105 |
| 1 2018/11/8 101 |
| 1 2018/10/31 100 |
| 1 2018/9/30 100 |
| 2 2018/11/30 200 |
| 2 2018/10/31 201 |
| 2 2018/9/30 205 |
| 3 2018/11/30 300 |
| 3 2018/10/31 305 |
| 3 2018/9/30 301 |
+-------------------------------+
I create a table named 'DateList' and use slicer to select a specified date
DateList Slicer
I want to achieve the result as follows:
as of *11/9/2018*
+-----------------------------------+
| ID TradeDate Value |
+-----------------------------------+
| 1 2018/11/8 101 |
| 2 2018/10/31 201 |
| 3 2018/10/31 305 |
+-----------------------------------+
| Total 607 |
+-----------------------------------+
Currently, I try to use the steps to achieve the above result.
First, i want to find the nearest date from table 'MyData' use the new measure
MyMaxDate = CALCULATE(MAX(MyData[TradeDate]),Filter(MyData, MyData[TradeDate] <= FIRSTDATE(DateList[Date]) ))
Second, i create a new measure "MySum" to the sum up the values if [tradedate] equal to the "MyMaxDate"
MySum = CALCULATE(SUM(MyDate[Value]),Filter(MyData, MyData[TradeDate] = MyMaxDate))
Third, i create a matrix to show the result (see Result)
Unfortunately, the result 1313 is different from my goal 607
So, how can i fix my DAX formula to achieve the right result?
Many Thanks
You can calculate the closest date by taking a min over the difference in dates and then taking the minimal date with that minimal difference.
MyDate =
VAR SlicerDate = MIN(DateList[Date])
VAR MinDiff =
MINX(
FILTER(ALL(MyData),
MyData[ID] IN VALUES(MyData[ID])
),
ABS(SlicerDate - MyData[TradeDate]))
RETURN
MINX(
FILTER(ALL(MyData),
MyData[ID] IN VALUES(MyData[ID])
&& ABS(SlicerDate - MyData[TradeDate]) = MinDiff
),
MyData[TradeDate])
From there you can create the summing measure fairly easily:
MySum = CALCULATE(SUM(MyData[Value]), FILTER(MyData, MyData[TradeDate] = [MyDate]))

How to calculate an expression and group by 2 fields in DAX?

I want to write an expression in DAX that will group by 2 fields: AgentID and LoginDate. Here is the expression:
Average Availability % Per Day = (LoginTime + WorkTime) / (LoginTime + WorkTime + BreakTime)
What I have written in DAX so far is :
Average Availability % Per Day =
AVERAGEX (
VALUES ( Logins[LoginDay] ),
(
DIVIDE (
SUM ( Logins[LoginDuration] ) + SUM ( Logins[WorkDuration] ),
SUM ( Logins[LoginDuration] ) + SUM ( Logins[WorkDuration] )
+ SUM ( Logins[BreakDuration] )
)
)
)
However, the problem is the expression is summing everything and then getting the average as opposed to evaluating the expression and grouping by each day and each AgentID before calculating the average.
EDIT: Adding sample data:
AgentID | LoginDay | LoginDuration | BreakDuration | WorkDuration
96385 | 7/5/2018 | 14472 | 803 |1447
96385 | 7/6/2018 | 14742 | 857 |1257
96385 | 7/12/2018 | 14404 | 583 |291
96385 | 7/13/2018 | 14276 | 636 |368
96385 | 7/19/2018 | 14456 | 788 |543
96385 | 7/20/2018 | 14550 | 390 |1727
96385 | 7/26/2018 | 66670 | 53224 |1076
96385 | 7/27/2018 | 14592 | 277 |1928
So for example, for this agent, I am getting an average availability % per day of .75 when it should really be .91