Powerquery, does string contain an item in a list - list

I would like to filter on whether multiple text columns ([Name], [GenericName], or [SimpleGenericName]) contains a substring from a list. The text is also mixed case so I need to do a Text.Lower([Column]) in there as well.
I've tried the formula:
= Table.SelectRows(#"Sorted Rows", each List.Contains(MED_NAME_LIST, Text.Lower([Name])))
However, this does not work as the Column [Name] does not exactly match those items in the list (e.g. it won't pick up "Methylprednisolone Tab" if the list contains "methylprednisolone")
An example of a working filter, with all some of the list written out is:
= Table.SelectRows(#"Sorted Rows", each Text.Contains(Text.Lower([Name]), "methylprednisolone") or Text.Contains(Text.Lower([Name]), "hydroxychloroquine") or Text.Contains(Text.Lower([Name]), "remdesivir") or Text.Contains(Text.Lower([GenericName]), "methylprednisolone") or Text.Contains(Text.Lower([GenericName]), "hydroxychloroquine") or Text.Contains([GenericName], "remdesivir") or Text.Contains(Text.Lower([SimpleGenericName]), "methylprednisolone") or Text.Contains(Text.Lower([SimpleGenericName]), "hydroxychloroquine") or Text.Contains([SimpleGenericName], "remdesivir"))
I would like to make this cleaner than having to write all of this out, as I would also like to be able to expand the list from a referenced table to make this a dynamic search.
Thank you in advance

If I have a list of medicines:
and I need to filter my table:
to only keep rows where certain columns (we'll specify which ones exactly later) contain case-insensitive, partial matches for any of the items in the above list of medicines, then one way to do this might be:
let
MED_NAME_LIST = {"MEthYlprednisolone", "hYdroxychloroquine", "rEMdesivir"},
initialTable = Table.FromRows({
{"Methylprednisolone Tab", "train", "car", "bike"},
{"no", "no", "no", "no"},
{"tram", "teleport", "hydroxychloroQuine Tab", "jet"},
{"no", "no", "no", "yes"},
{"REMdesivir Tab", "bus", "taxi", "concord"}
}, type table [Name = text, GenericName = text, SimpleGenericName = text, SomeOtherColumn = text]),
filtered = Table.SelectRows(initialTable, each List.ContainsAny(
{[Name], [GenericName], [SimpleGenericName]},
MED_NAME_LIST,
(rowValue as text, medicineFromList as text) as logical => Text.Contains(rowValue, medicineFromList, Comparer.OrdinalIgnoreCase)
))
in
filtered
In filtered, List.ContainsAny is used to determine if any of the specified columns (Name, GenericName, SimpleGenericName) contain a "match" for any of the values in MED_NAME_LIST.
The criteria for the "match" is that:
case sensitivity must be ignored (hence Comparer.OrdinalIgnoreCase is used)
the match must be partial (hence Text.Contains is used)
The above code gives me the following, which I believe is the filtering behaviour you described:

Related

Remove columns by name based on pattern

How can I remove a large number of columns by name based on a pattern?
A data set exported from Jira has a ton of extra columns that I've no interest in. 400 Log entries, 50 Comments, dozens of links or attachments. Problem is that they get random numbers assigned which means that removing them with hardcoded column names will not work. That would look like this and break as the numbers change:
= Table.RemoveColumns(#"Previous Step",{"Watchers", "Watchers_10", "Watchers_11", "Watchers_12", "Watchers_13", "Watchers_14", "Watchers_15", "Watchers_16", "Watchers_17", "Watchers_18", "Watchers_19", "Watchers_20", "Watchers_21", "Watchers_22", "Watchers_23", "Watchers_24", "Watchers_25", "Watchers_26", "Watchers_27", "Watchers_28", "Log Work", "Log Work_29", "Log Work_30", "Log Work_31", "Log Work_32", ...
How can I remove a large number of columns by using a pattern in the name? i.e. remove all "Log Work" columns.
The best way I've found is to use List.FindText on Table.ColumnNames to get a list of column names dynamically based on target string:
= Table.RemoveColumns(#"Previous Step", List.FindText(Table.ColumnNames(#"Previous Step"), "Log Work")
This works by first grabbing the full list of Column Names and keeping only the ones that match the search string. That's then sent to RemoveColumns as normal.
Limitation appears to be that FindText doesn't offer complex pattern matching.
Of course, when you want to remove a lot of different patterns, having individual steps isn't very interesting. A way to combine this is to use List.Combine to join the resulting column names together.
That becomes:
= Table.RemoveColumns(L, List.Combine({ List.FindText(Table.ColumnNames(L), "Watchers_"), List.FindText(Table.ColumnNames(L), "Log Work"), List.FindText(Table.ColumnNames(L), "Comment"), List.FindText(Table.ColumnNames(L), "issue link"), List.FindText(Table.ColumnNames(L), "Attachment")} ))
SO what's actually written there is:
Table.RemoveColumns(PreviousStep, List.Combine({ foundList1, foundlist2, ... }))
Note the { } that signifies a list! You need to use this as List.Combine only accepts a single argument which is itself already a List of lists. And the Combine call is required here.
Also note the L here instead of #"Previous Step". That's used to make the entire thing more readable. Achieved by inserting a step named "L" that just has = #"Promoted Headers".
This allows relatively maintainable removal of multiple columns by name, but it's far from perfect.

PowerQuery: How to replace text with each column name for multiple columns

I'm trying to replace "x" in each column (excepts for the first 2 columns) with the column name in a table with an unknown number of columns but with at least 2 columns.
I found the code to change one column, but I want it to be dynamic:
#"Ersatt värde" = Table.ReplaceValue(Källa,"x", Table.ColumnNames(Källa){2},Replacer.ReplaceText,{Table.ColumnNames(Källa){2}})
Any ideas on how to solve it?
If I understand correctly, I think you can try either approach below:
#"Ersatt värde" =
let
columnsToTransform = List.Skip(Table.ColumnNames(Källa), 2),
accumulated = List.Accumulate(columnsToTransform, Källa, (tableState as table, columnName as text) =>
Table.ReplaceValue(tableState,"x", columnName, Replacer.ReplaceText, {columnName})
)
in accumulated
or:
#"Ersatt värde" =
let
columnsToTransform = List.Skip(Table.ColumnNames(Källa), 2),
transformations = List.Transform(columnsToTransform, (columnName) => {columnName, each
Replacer.ReplaceText(Text.From(_), "x", columnName)}),
transformed = Table.TransformColumns(Källa, transformations)
in transformed,
Both ways follow a similar approach:
Figure out which columns to do replacements in (i.e. all except the first 2 columns)
Loop over columns determined in previous step and actually do the replacement.
I've used Replacer.ReplaceText since that's what you'd used in your question, but I believe this will replace both partial matches and full matches.
If you only want full matches to be replaced, I think you can use Replacer.ReplaceValue instead.

Text.Trim Syntax Meaning

I'm working in a Power BI query, trying to trim whitespace from text.
Looking at Microsoft's M reference, I came across the Text.Trim syntax:
Text.Trim(text as nullable text, optional trimChars as any) as nullable text
I couldn't figure out how to plug it into my query code correctly (where it would actually work) so I did some more searching and came across this:
#"Trimmed Text" = Table.TransformColumns(#"Removed Other Columns",{},Text.Trim),
...which doesn't look anything like Microsoft's syntax but works fine for me as I insert it into my code like this:
let
Source = Banding,
#"Removed Other Columns" = Table.SelectColumns(Source,{"Segment", "Granular Band"}),
#"Trimmed Text" = Table.TransformColumns(#"Removed Other Columns",{},Text.Trim),
#"Removed Duplicates" = Table.Distinct(#"Trimmed Text", {"Granular Band"})
in
#"Removed Duplicates"
My problem is that I don't understand what the line's syntax meaning is. I understood Microsoft's example as meaning basically, trim THIS; where THIS is the text I want trimmed. Pretty straightforward.
But I don't know what the syntax meaning of the line that actually works is. I understand that it says to transform table columns (Table.TransformColumns); but I don't know if the reference to the previous line (#"Removed Other Columns") serves as any real "input" for the Text.Trim or if {} is a reference to all columns in the table, or (more importantly to me) how I would reference specific columns. (I've tried a few approaches for specifying columns and failed every time.) I also don't understand why I don't need any arguments following Text.Trim (like in Microsoft's example).
If someone would translate what the line is "saying" in a manner I can understand, I'd sure appreciate it.
The generated code:
#"Trimmed Text" = Table.TransformColumns(#"Removed Other Columns",{},Text.Trim),
means that you most probably selected all columns and then you selected Transform - Format - Trim.
If you would have selected 1 or more columns, then the names of these columns and the required operations would have been between the {}, like in
= Table.TransformColumns(#"Removed Other Columns",{{"SomeText", Text.Trim}})
Any function that is invoked within Table.TransformColumns, gets the column values automatically supplied from Table.TransformColumns, so in this case: the first argument for Text.Trim (text as nullable text).
The other arguments will use the default value, i.c. space, so by default all leading and trailing spaces are removed.
If you want to use other arguments of Text.Trim within the Table.TransformColumns context, then you need to adjust the code and supply the keyword "each", and use an _ as placeholder for the column values.
For example the next code removes leading and trailing spaces and semicolons:
= Table.TransformColumns(#"Removed Other Columns",{{"SomeText", each Text.Trim(_, {" ",";"})}})

How to split column in Power Query by the first space?

I am using Power Query and have a column called LandArea; example data is "123.5 sq mi". It is of data type text. I want to remove the "sq mi" part so I just have the number value, 123.5. I tried the Replace function to replace "sq mi" with blank but that doesn't work because it looks at the entire text. So I tried to use Split where I split it on the space and it generated this formula below, and it did create a new column, but with null for all values. The original column still had "123.5 sq mi".
Table.SplitColumn(#"Reordered Columns1","LandArea",Splitter.SplitTextByDelimiter(" ", QuoteStyle.None),{"LandArea.1", "LandArea.2"})
When just splitting at the left-most delimiter:
Table.SplitColumn(#"Reordered Columns1","LandArea",Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.None, false),{"LandArea.1", "LandArea.2"})
I have also tried changing to QuoteStyle.Csv. Any idea how I can get this to work?
Use this to create a custom column:
= Table.AddColumn(
#"Reordered Columns1",
"NewColumn",
each Text.Start([LandArea],Text.PositionOf([LandArea]," "))
)
UPDATE: Every one appears to have "sq mi"
= Table.AddColumn(#"Changed Type", "Custom", each Text.Replace([LandArea]," sq mi",""),type number)
Hope it helps.
This is what I ended up using:
Table.AddColumn(#"Reordered Columns1", "LandArea2",
each Text.Start([LandArea], Text.PositionOf([LandArea], "sq")-1))
I avoided trying to find whitespace.

Find String from One List within Another List and Return String Found

I found part of what I was looking for at Matchlists/tables in power query, but I need a bit more.
Using the "Flags only" example provided at Matchlists/tables in power query, I’m comparing two lists, ListA and ListB, to check if ListB’s row content appears in ListA’s row content at all. I can’t do a one-for-one match of both rows’ contents (like with List.Intersect) because the content of a row in ListB might only be part of the content of a row in ListA.
Note that, in the query below, ListB includes “roo”, which is the first three letters in the word room. I would want to know that “roo” is in ListA’s row that has “in my room.”
The "Flags only" example provided by Matchlists/tables in power query already determines that “roo” is part of ListA’s row that has “in my room.” I built on the example to assign “yes,” instead of true when there is such a match between the ListA and ListB.
What I’d like to do is to replace “yes” with the actual value from ListB — the value “roo,” for instance. I tried to simply substitute wordB for “yes” but I got an error that wordB wasn’t recognized.
let
ListA = {"help me rhonda", "in my room", "good vibrations", "god only knows"},
ListB = {"roo", "me", "only"},
contains_word=List.Transform(ListA, (lineA)=>if List.MatchesAny(ListB, (wordB)=>Text.Contains(lineA, wordB)) = true then "yes" else "no")
in
contains_word
The current query results in this:
List
1 yes
2 yes
3 no
4 yes
I want the query results to be:
List
1 roo
2 me
3
4 only
Any idea how to make it so?
(p.s. I'm extremely new to Power Query / M)
Thanks
I would do this way:
let
ListA = {"help me rhonda", "in my room", "good vibrations", "god only knows"},
ListB = {"roo", "me", "only"},
contains_word=List.Transform(ListA, (lineA)=>List.Select(List.Transform(ListB, (wordB)=>if Text.Contains(lineA, wordB) = true then wordB else null), (x)=>x <> null){0}?)
in
contains_word
[edited]
The idea is to use List.Transform twice: inner one changes list B to leave only matching values. Then 1st non-null of latest replaces string from list A (outer List.Tramsform).
Edit: I think you switched the first 2 elements of the result?
You can use the following code:
let
ListA = {"help me rhonda", "in my room", "good vibrations", "god only knows"},
ListB = {"roo", "help", "me", "only"},
TableA = Table.FromList(ListA,null,{"ListA"}),
AddedListBMatches = Table.AddColumn(TableA, "ListBMatches", (x) => List.Select(ListB, each Text.PositionOf(x[ListA], _) >= 0)),
ExtractedValues = Table.TransformColumns(AddedListBMatches, {"ListBMatches", each Text.Combine(List.Transform(_, Text.From), ","), type text}),
Result = ExtractedValues[ListBMatches]
in
Result
The "ExtractedValues" step is the result of pressing the expand button in the header of the "ListBMatches" column and choose Extract Values, comma separated.
This option was added in the January 2017 update.
I added "help" to ListB so the first element of ListA has 2 matches that are both returned.