Count Specific Word by Row in DAX - powerbi

I'm trying to count the number of times the word "text" appears per row in Power BI. I've done a lot of google searching and seen examples like this:
Formula :=
CALCULATE (
COUNTROWS ( FILTER ( 'TestData', FIND ( "text", 'TestData'[Description],, 0 ) > 0 ) ),1=1
)
but it isn't quite getting me there. How can I get for row 1, a result of 1 and row 2, a result of 3.
CREATE TABLE [dbo].[TestData](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Description] [varchar](100) NULL
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[TestData] ON
GO
INSERT [dbo].[TestData] ([ID], [Description]) VALUES (1, N'this is my demo text')
GO
INSERT [dbo].[TestData] ([ID], [Description]) VALUES (2, N'text text demo text')
GO
SET IDENTITY_INSERT [dbo].[TestData] OFF
GO
Expected Result
ID Description Text Word Count
1 this is my demo text 1
2 text text demo text 3

Calculated Column:
=
VAR MySearchText = "text"
RETURN
DIVIDE(
LEN( Table1[Description] )
- LEN( SUBSTITUTE( Table1[Description], MySearchText, "" ) ),
LEN( MySearchText )
)
Measure:
=
VAR MySearchText = "text"
VAR ThisDescription =
MIN( Table1[Description] )
RETURN
DIVIDE(
LEN( ThisDescription )
- LEN( SUBSTITUTE( ThisDescription, MySearchText, "" ) ),
LEN( MySearchText )
)
though note that both of these will return positive counts where MySearchText is found within other words: a description of "this is textual", for example, will return 1.

I don't believe DAX has a textsplit function, but you can do something like this to ensure you don't pick up words of which text is a substring.
Text Count (DAX) =
VAR pad = SUBSTITUTE(" " & [Description] & " ","text","~text~")
VAR lenPad = LEN(pad)
VAR lenText = LEN("~text~")
VAR lenRemText = LEN(SUBSTITUTE(pad,"~text~",""))
RETURN (lenPad-lenRemText)/lenText
```

Related

Convert comma separated text to a list of numbers

In Power BI, I have created a DAX query created with a var giving comma-separated text using CONCATENATEX function.
Output like
Var = "1,2,3,4,5,6"
Now I want to search this var into my table column with syntax
Table[col] in {var}.
It is throwing an error.
I even tried converting the column to string with syntax
Convert(table[col], string) in {var}
The error is removed but the data doesn't match with column data.
I found this Power BI community thread that is essentially the same question.
The solution is to convert the string into a path object.
Here's the code from the link:
Measure 3 =
VAR mymeasure =
SUBSTITUTE ( [Measure], ",", "|" )
VAR Mylen =
PATHLENGTH ( mymeasure )
VAR mytable =
ADDCOLUMNS (
GENERATESERIES ( 1, mylen ),
"mylist", VALUE ( PATHITEM ( mymeasure, [Value] ) )
)
VAR mylist =
SELECTCOLUMNS ( mytable, "list", [mylist] )
RETURN
CALCULATE ( COUNTROWS ( Table1 ), Table1[ID] IN mylist )
There's a bit of redundancy in the above, so I'd probably condense it a bit and write it like this:
SomeMeasureFilteredByVar =
VAR PathVar = SUBSTITUTE ( [Var], ",", "|" )
VAR VarList =
SELECTCOLUMNS (
GENERATESERIES ( 1, PATHLENGTH ( PathVar ) ),
"Item", VALUE ( PATHITEM ( PathVar, [Value] ) )
)
RETURN
CALCULATE ( [SomeMeasure], 'Table'[ID] IN VarList )
Edit: To just create a calculated table, you can skip the last step:
TableFromString =
VAR PathVar = SUBSTITUTE ( "1,2,3,4,5,6", ",", "|" )
RETURN
SELECTCOLUMNS (
GENERATESERIES ( 1, PATHLENGTH ( PathVar ) ),
"Item", VALUE ( PATHITEM ( PathVar, [Value] ) )
)
Note that it is not possible to create a calculated table that is dynamically responsive to report filters and slicers. Materialized calculated tables are only calculated once when the data loads, not every time you adjust a filter or slicer.
Therefore, you can use a measure in the table instead of the string "1,2,3,4,5,6" but the table output will be the same regardless of what the measure returns within different filter contexts.

How to show rows when column match criteria?

How can I define a measure to set a condition to show values of apples when column = fruits contains the string apple?
I've tried this but I only see query batch complete with errors.
MEASURE Table1[apples] = IF(SEARCH("apple",'Table1'[Fruits]),"Yes","No")
Assuming you want to return a sum of the column 'Table1'[Value] for those rows where the 'Table1'[Fruits]-column contains the string "apple", you should use:
MEASURE Table1[apples] =
CALCULATE (
SUM ( 'Table1'[Value] ),
KEEPFILTERS(
SEARCH ( "apple", 'Table1'[Fruits], 1, 0 ) <> 0
)
)

If it is the first date or last date use one aggregation; if any other day, use another aggregation

Let's assume I have the below dataset.
What I need to create the below matrix where if it is the beginning or month end, I aggregate A or B in Category 1 and calculate SUM but if it is any other day in a month but 1st or last, I am tagging A or B in Category 2 and calculate SUM. I guess I need to use SWITCH, don't I?
Edit in info from comments
Like to create 3 col:
isStart = IF ( main_table[date] = STARTOFMONTH ( main_table[date] ), 1, 0 )
isEnd = IF ( main_table[date] = ENDOFMONTH ( 'main_table'[date] ), 1, 0 )
in_between_date =
IF ( AND ( main_table[date] <> ENDOFMONTH ( 'main_table'[date] ),
main_table[date] <> STARTOFMONTH ( main_table[date] ) ), 1, 0 )
Then, create the columns with my categories, like
start_end =
IF ( OR ( NOT ( ISERROR ( SEARCH ( "A", main_table[code] ) ) ),
main_table[code] = "B" ),
"Category 1",
BLANK () )
and
in_between =
IF ( OR ( main_table[code] = "B", main_table[code] = "A" ), "Category 2", BLANK () )
But then, what should I use in switch/if ? = if(VALUES('main_table'[isStart]) = 1, then what?
You where on the right track but overcomplicated a bit. You only need one extra column "Category" giving for each row in what category the item falls.
Category =
IF (
startEnd[date] = STARTOFMONTH ( startEnd[date] )
|| startEnd[date] = ENDOFMONTH ( startEnd[date] );
"Category1";
"Category2"
)
table end result is:

Find the last value after the dot in DAX e.g. in value1.value2.value3 finds value3

I have a column with a dotted value e.g
project.phase.task
project.task
project.phase.subphase.task
in each instance I want the last doted value.
I am using the DAX function:
lastVal =
var i = FIND(".",'My Table'[wbs],1,-1)
return
if(i>0,mid('My Table'[wbs],i+1,99),"")
There is no FindLast function. How would I simulate it?
This would be a similar problen to finding a file extension in a file name or path.
You don't need to do this with DAX. You can use Split Column function of the Query Editor to get the part of the text after the right-most dot.
You can do this with DAX.
As a calculated column:
lastVal =
PATHITEMREVERSE (
SUBSTITUTE (
'My Table'[wbs],
".",
"|"
),
1
)
As a measure:
lastValMeasure =
VAR MyString =
IF (
HASONEVALUE ( 'My Table'[wbs] ),
FIRSTNONBLANK ( 'My Table'[wbs], 1 ),
BLANK()
)
RETURN
PATHITEMREVERSE (
SUBSTITUTE (
MyString,
".",
"|"
),
1
)
You can turn the string into an indexed list of characters and then take the characters after the maximally indexed ..
This should work as a calculated column:
lastVal =
VAR String = 'My Table'[wbs]
VAR StringLength = LEN ( String )
VAR StringToTable =
ADDCOLUMNS (
GENERATESERIES ( 1, StringLength ),
"Char", MID ( String, [Value], 1 )
)
VAR LastDot = MAXX ( FILTER ( StringToTable, [Char] = "." ), [Value] )
RETURN
RIGHT ( String, StringLength - LastDot )
If you need it as a measure, then simply adjust the String variable to take the appropriate aggregate. For example, MAX('My Table'[wbs]) or FIRSTNONBLANK('My Table'[wbs], 'My Table'[wbs]).

Intersection of Customer product purchase (powerBI)

I need help with producing a count of the intersections between customers and which items they have purchased. For example, if there are 5 products, a customer can purchase any single product or any combination of the 5. Customers can also re-purchase a product at any date - this is where my problem arises as an end user wants to be able to see the intersections for any selected date range.
I have managed to come up with a solution which includes the use of parameters but this is not ideal as the end user does not have access to change any parameters of the report.
I'm open to any solution that does not involve parameters, ideally a slicer with dates would be the best solution
The fields I have on the table are customer_ID, date_ID, and product
Example Data
customer_id date_id product
1 9/11/2018 A
1 10/11/2018 A
1 10/11/2018 B
1 11/11/2018 C
1 11/11/2018 A
2 9/11/2018 C
2 10/11/2018 D
2 11/11/2018 E
2 11/11/2018 A
3 10/11/2018 A
3 10/11/2018 B
3 11/11/2018 A
3 11/11/2018 B
3 11/11/2018 B
4 10/11/2018 A
4 11/11/2018 A
5 9/11/2018 A
5 10/11/2018 B
5 10/11/2018 E
5 10/11/2018 D
5 11/11/2018 C
5 11/11/2018 A
6 9/11/2018 A
6 10/11/2018 A
6 11/11/2018 A
Possible output with different slicer selections
Any help at all would be greatly appreciated
This is pretty tricky since I can't think of a way to use the values of a dynamically calculated table as a field in a visual. (You can create calculated tables, but those aren't responsive to slicers. You can also create dynamically calculated tables inside of a measure, but measures don't return tables, just single values.)
The only way I can think of to do this requires creating a table for every possible product combination. However, if you have N products, then this table has 2N rows and that blows up fast.
Here's a calculated table that will output all the combinations:
Table2 =
VAR N = DISTINCTCOUNT(Table1[product])
VAR Products = SUMMARIZE(Table1,
Table1[product],
"Rank",
RANKX(ALL(Table1),
Table1[product],
MAX(Table1[product]),
ASC,
Dense
)
)
VAR Bits = SELECTCOLUMNS(GENERATESERIES(1, N), "Bit", [Value])
VAR BinaryString =
ADDCOLUMNS(
GENERATESERIES(1, 2^N),
"Binary",
CONCATENATEX(
Bits,
MOD( TRUNC( [Value] / POWER(2, [Bit]-1) ), 2)
,,[Bit]
,DESC
)
)
RETURN
ADDCOLUMNS(
BinaryString,
"Combination",
CONCATENATEX(Products, IF(MID([Binary],[Rank],1) = "1", [product], ""), "")
)
Then add a calculated column to get the column delimited version:
Delimited =
VAR Length = LEN(Table2[Combination])
RETURN
CONCATENATEX(
GENERATESERIES(1,Length),
MID(Table2[Combination], [Value], 1),
","
)
If you put Delimited the Rows section on a matrix visual and the following measure in the Values section:
customers =
VAR Summary = SUMMARIZE(Table1,
Table1[customer_id],
"ProductList",
CONCATENATEX(VALUES(Table1[product]), Table1[product], ","))
RETURN SUMX(Summary, IF([ProductList] = MAX(Table2[Delimited]), 1, 0))
And filter out any 0 customer values, you should get something like this:
So yeah... not a great solution, especially when N gets big, but maybe better than nothing?
Edit:
In order to work for longer product names, let's use a delimiter in the Combination concatenation:
CONCATENATEX(Products, IF(MID([Binary],[Rank],1) = "1", [product], ""), ",")
(Note the "" to "," change at the end.)
And then rewrite the Delimited calculated column to remove excess commas.
Delimited =
VAR RemoveMultipleCommas =
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(Table2[Combination], ",,", ","),
",,", ","),
",,", ","),
",,", ",")
VAR LeftComma = (LEFT(Table2[Combination]) = ",")
VAR RightComma = (RIGHT(Table2[Combination]) = ",")
RETURN
IF(RemoveMultipleCommas <> ",",
MID(RemoveMultipleCommas,
1 + LeftComma,
LEN(RemoveMultipleCommas) - RightComma - LeftComma
), "")
Finally, let's modify the customers measure a bit so it can subtotal.
customers =
VAR Summary = SUMMARIZE(Table1,
Table1[customer_id],
"ProductList",
CONCATENATEX(VALUES(Table1[product]), Table1[product], ","))
VAR CustomerCount = SUMX(Summary, IF([ProductList] = MAX(Table2[Delimited]), 1, 0))
VAR Total = IF(ISFILTERED(Table2[Delimited]), CustomerCount, COUNTROWS(Summary))
RETURN IF(Total = 0, BLANK(), Total)
The Total variable gives the total customer count for the total. Note that I've also set zeros to return as blank so that you don't need to filter out zeros (it will automatically hide those rows).
You can also try this measure to calculate the result.
[Count Of Customers] :=
VAR var_products_selection_count = DISTINCTCOUNT ( Sales[product] )
VAR var_customers = VALUES ( Sales[customer_id] )
VAR var_customers_products_count =
ADDCOLUMNS(
var_customers,
"products_count",
VAR var_products_count =
COUNTROWS (
FILTER (
CALCULATETABLE ( VALUES ( Sales[product] ) ),
CONTAINS (
Sales,
Sales[product],
Sales[product]
)
)
)
RETURN var_products_count
)
RETURN
COUNTROWS (
FILTER (
var_customers_products_count,
[products_count] = var_products_selection_count
)
)
I think I've found a better solution/workaround that doesn't require precomputing all possible combinations. The key is to use a rank/index as a base column and then built off of that.
Since the customer_id is already nicely indexed starting from 1 with no gaps, in this case, I will use that, but if it weren't, then you'd want to create an index column to use instead. Note that there cannot be more distinct product combinations within a given filter context than there are customers since each customer only has a single combination.
For each index/rank we want to find the product combination that is associated with it and the number of customers for that combination.
ProductCombo =
VAR PerCustomer =
SUMMARIZE (
ALLSELECTED ( Table1 ),
Table1[customer_id],
"ProductList",
CONCATENATEX ( VALUES ( Table1[product] ), Table1[product], "," )
)
VAR ProductSummary =
SUMMARIZE (
PerCustomer,
[ProductList],
"Customers",
DISTINCTCOUNT ( Table1[customer_id] )
)
VAR Ranked =
ADDCOLUMNS (
ProductSummary,
"Rank",
RANKX (
ProductSummary,
[Customers] + (1 - 1 / RANKX ( ProductSummary, [ProductList] ) )
)
)
VAR CurrID =
SELECTEDVALUE ( Table1[customer_id] )
RETURN
MAXX ( FILTER ( Ranked, [Rank] = CurrID ), [ProductList] )
What this does is first create a summary table that computes the product list for each customer.
Then you take that table and summarize over the distinct product lists and counting the number of customers that have each particular combination.
Then I add a ranking column to the previous table ordering first by the number of customers and tiebreaking using a dictionary order of the product list.
Finally, I extract the product list from this table where the rank matches the index/rank of the current row.
You could do a nearly identical measure for the customer count, but here's the measure I used that's a bit simpler and handles 0 values and the total:
Customers =
VAR PerCustomer =
SUMMARIZE (
ALLSELECTED ( Table1 ),
Table1[customer_id],
"ProductList",
CONCATENATEX ( VALUES ( Table1[product] ), Table1[product], "," )
)
VAR ProductCombo = [ProductCombo]
VAR CustomerCount =
SUMX ( PerCustomer, IF ( [ProductList] = ProductCombo, 1, 0 ) )
RETURN
IF (
ISFILTERED ( Table1[customer_id] ),
IF ( CustomerCount = 0, BLANK (), CustomerCount ),
DISTINCTCOUNT ( Table1[customer_id] )
)
The result looks like this