How can I search a list of text strings for more than one word or collection of words in Power Query? - list

I have a table of data that I converted into a list using Table.ColumnsNames, and with this list I want to be able to select multiple items in the list and put into a new list and remove all items I did not select. For examples my current list contains {Apple, Pear, Orange, Banana} I want to extract "Apple" and "Banana" from the list and into a new one.
I tried doing this with List.contains or List.FindText but you can only select one parameter to such as "Apple" or "Banana" not both.
If anyone has a solution for this it would be great!!

you want List.Intersect or List.Difference See documentation at
https://learn.microsoft.com/en-us/powerquery-m/list-difference
https://learn.microsoft.com/en-us/powerquery-m/list-intersect
This looks for [Apple Pear Dog] from list of [Apple Pear Orange Banana] and returns [Apple Pear]
= List.Intersect ({{"Apple", "Pear", "Orange", "Banana"},{"Apple", "Pear", "Dog"}})

Related

Powerquery, does string contain an item in a list

I would like to filter on whether multiple text columns ([Name], [GenericName], or [SimpleGenericName]) contains a substring from a list. The text is also mixed case so I need to do a Text.Lower([Column]) in there as well.
I've tried the formula:
= Table.SelectRows(#"Sorted Rows", each List.Contains(MED_NAME_LIST, Text.Lower([Name])))
However, this does not work as the Column [Name] does not exactly match those items in the list (e.g. it won't pick up "Methylprednisolone Tab" if the list contains "methylprednisolone")
An example of a working filter, with all some of the list written out is:
= Table.SelectRows(#"Sorted Rows", each Text.Contains(Text.Lower([Name]), "methylprednisolone") or Text.Contains(Text.Lower([Name]), "hydroxychloroquine") or Text.Contains(Text.Lower([Name]), "remdesivir") or Text.Contains(Text.Lower([GenericName]), "methylprednisolone") or Text.Contains(Text.Lower([GenericName]), "hydroxychloroquine") or Text.Contains([GenericName], "remdesivir") or Text.Contains(Text.Lower([SimpleGenericName]), "methylprednisolone") or Text.Contains(Text.Lower([SimpleGenericName]), "hydroxychloroquine") or Text.Contains([SimpleGenericName], "remdesivir"))
I would like to make this cleaner than having to write all of this out, as I would also like to be able to expand the list from a referenced table to make this a dynamic search.
Thank you in advance
If I have a list of medicines:
and I need to filter my table:
to only keep rows where certain columns (we'll specify which ones exactly later) contain case-insensitive, partial matches for any of the items in the above list of medicines, then one way to do this might be:
let
MED_NAME_LIST = {"MEthYlprednisolone", "hYdroxychloroquine", "rEMdesivir"},
initialTable = Table.FromRows({
{"Methylprednisolone Tab", "train", "car", "bike"},
{"no", "no", "no", "no"},
{"tram", "teleport", "hydroxychloroQuine Tab", "jet"},
{"no", "no", "no", "yes"},
{"REMdesivir Tab", "bus", "taxi", "concord"}
}, type table [Name = text, GenericName = text, SimpleGenericName = text, SomeOtherColumn = text]),
filtered = Table.SelectRows(initialTable, each List.ContainsAny(
{[Name], [GenericName], [SimpleGenericName]},
MED_NAME_LIST,
(rowValue as text, medicineFromList as text) as logical => Text.Contains(rowValue, medicineFromList, Comparer.OrdinalIgnoreCase)
))
in
filtered
In filtered, List.ContainsAny is used to determine if any of the specified columns (Name, GenericName, SimpleGenericName) contain a "match" for any of the values in MED_NAME_LIST.
The criteria for the "match" is that:
case sensitivity must be ignored (hence Comparer.OrdinalIgnoreCase is used)
the match must be partial (hence Text.Contains is used)
The above code gives me the following, which I believe is the filtering behaviour you described:

Data intersection yes or no in google Sheets: Looking for ANY of the list items in a defined string (which is also the result of an array formula)

We have the following situation.
We have 1 cell to show the result of a determination in.
We have 1 cell with data to check for (String, but assume result of array formula concatenated so can use the string or the array)
And 1 list of items to check against (range reference)
Said otherwise:
The resulting cell should loop through all the items in LIST OF FRUITS. If ANY of the fruits is found in the string (or the array before we join it to string) in CELL TO CHECK then print "Found" - or if easier print the list item that was found.
And said even more basic: how do we easily determine an intersect between two collections (one of which is the result of an array formula) the other a range reference.
Question: Is this possible and if so then how?
TEST A (Found)
RESULT CELL CELL TO CHECK
Found! mangoes, apples
TEST B (Not found)
RESULT CELL CELL TO CHECK
- Pineapple
LIST OF FRUITS
Apples
Pears
Bananas
Mangoes
try:
=ARRAYFORMULA(IF(B1:B="",,IF(REGEXMATCH(LOWER(B1:B),
LOWER(TEXTJOIN("|", 1, D:D))), "found!", "-")))

Power Query check if string contains strings from a list

Is there a way to check a text field to see if it contains any of the strings from a list?
Example Strings to Check:
The raisin is green
The pear is red
The apple is yellow
List Example to Validate Against
red
blue
green
The result would be
either:
green
red
null
or:
TRUE
TRUE
FALSE
Daniel has a decent solution, but it won't work if the example strings aren't space-separated. For example, The brick is reddish would detect red as a substring.
You can create a custom column with this formula instead:
(C) => List.AnyTrue(List.Transform(Words, each Text.Contains(C[Texts], _)))
This takes the list Words = {"red","blue","green"} and checks if each of the colors in the list is contained in the [Texts] column for that row. If any are, then it returns TRUE otherwise FALSE.
The whole query looks like this:
let
TextList = {"The raisin is green","The pear is red","The apple is yellow"},
Texts = Table.FromList(TextList, Splitter.SplitByNothing(), {"Texts"}, null, ExtraValues.Error),
Words = {"red","blue","green"},
#"Added Custom" = Table.AddColumn(Texts, "Check", (C) => List.AnyTrue(List.Transform(Words, each Text.Contains(C[Texts], _))))
in
#"Added Custom"
This will make the trick, it's PowerQuery ("M") code:
let
Texts = {"The raisin is green","The pear is red","The apple is yellow"},
Words = {"red","blue","green"},
TextsLists = List.Transform(Texts, each Text.Split(_," ")),
Output = List.Transform(TextsLists, each List.Count(List.Intersect({_,Words}))>0)
in
Output
There are two lists: the sentences (Texts) and the words to check (Words). The first thing to do is to convert the sentences in lists of words splitting the strings using " " as the delimiter.
TextsLists = List.Transform(Texts, each Text.Split(_," ")),
Then you "cross" the new lists with the list of Words. The result are lists of elements (strings) that appears in both lists (TextLists and Words). Now you count these new lists and check if the result is bigger than cero.
Output = List.Transform(TextsLists, each List.Count(List.Intersect({_,Words}))>0)
Output is a new list {True, True, False).
Alternatively, you can change the Output line by this one:
Output = List.Transform(TextsLists, each List.Intersect({_,Words}){0}?)
This will return a list of the first coincidence or null if there's no coincidence. In the example: {"green", "red", "null"}
Hope this helps you.
each if Text.Remove([Texts], {"The raisin is green","The pear is red","The apple is yellow"})<>[Texts] then ...

keyword inspection based on words present in multiple lists

I have a dictionary similar to this:
countries = ["usa", "france", "japan", "china", "germany"]
fruits = ["mango", "apple", "passion-fruit", "durion", "bananna"]
cf_dict = {k:v for k,v in zip(["countries", "fruits"], [countries, fruits])}
and I also have a list of strings similar to this:
docs = ["mango is a fruit that is very different from Apple","I like to travel, last year I was in Germany but I like France.it was lovely"]
I would like to inspect the docs and see if each string contains any of the keywords in any of the lists(the values of cf_dict are lists) in cf_dict, and if they are present then return the corresponding key(based on values) for that string(strings in docs) as output.
so for instance, if I inspect the list docs the output will be [fruits, countries]
something similar to this answer but this checks only one list, however, I would like to check multiple lists.
The following returns a dict of sets in case a string matches values in more than one list (e.g. 'apple grows in USA' should be mapped to {'fruits', 'countries'}).
print({s: {k for k, l in cf_dict.items() for w in l if w in s.lower()} for s in docs})
This outputs:
{'mango is a fruit that is very different from Apple': {'fruits'}, 'I like to travel, last year I was in Germany but I like France.it was lovely': {'countries'}}

Find String from One List within Another List and Return String Found

I found part of what I was looking for at Matchlists/tables in power query, but I need a bit more.
Using the "Flags only" example provided at Matchlists/tables in power query, I’m comparing two lists, ListA and ListB, to check if ListB’s row content appears in ListA’s row content at all. I can’t do a one-for-one match of both rows’ contents (like with List.Intersect) because the content of a row in ListB might only be part of the content of a row in ListA.
Note that, in the query below, ListB includes “roo”, which is the first three letters in the word room. I would want to know that “roo” is in ListA’s row that has “in my room.”
The "Flags only" example provided by Matchlists/tables in power query already determines that “roo” is part of ListA’s row that has “in my room.” I built on the example to assign “yes,” instead of true when there is such a match between the ListA and ListB.
What I’d like to do is to replace “yes” with the actual value from ListB — the value “roo,” for instance. I tried to simply substitute wordB for “yes” but I got an error that wordB wasn’t recognized.
let
ListA = {"help me rhonda", "in my room", "good vibrations", "god only knows"},
ListB = {"roo", "me", "only"},
contains_word=List.Transform(ListA, (lineA)=>if List.MatchesAny(ListB, (wordB)=>Text.Contains(lineA, wordB)) = true then "yes" else "no")
in
contains_word
The current query results in this:
List
1 yes
2 yes
3 no
4 yes
I want the query results to be:
List
1 roo
2 me
3
4 only
Any idea how to make it so?
(p.s. I'm extremely new to Power Query / M)
Thanks
I would do this way:
let
ListA = {"help me rhonda", "in my room", "good vibrations", "god only knows"},
ListB = {"roo", "me", "only"},
contains_word=List.Transform(ListA, (lineA)=>List.Select(List.Transform(ListB, (wordB)=>if Text.Contains(lineA, wordB) = true then wordB else null), (x)=>x <> null){0}?)
in
contains_word
[edited]
The idea is to use List.Transform twice: inner one changes list B to leave only matching values. Then 1st non-null of latest replaces string from list A (outer List.Tramsform).
Edit: I think you switched the first 2 elements of the result?
You can use the following code:
let
ListA = {"help me rhonda", "in my room", "good vibrations", "god only knows"},
ListB = {"roo", "help", "me", "only"},
TableA = Table.FromList(ListA,null,{"ListA"}),
AddedListBMatches = Table.AddColumn(TableA, "ListBMatches", (x) => List.Select(ListB, each Text.PositionOf(x[ListA], _) >= 0)),
ExtractedValues = Table.TransformColumns(AddedListBMatches, {"ListBMatches", each Text.Combine(List.Transform(_, Text.From), ","), type text}),
Result = ExtractedValues[ListBMatches]
in
Result
The "ExtractedValues" step is the result of pressing the expand button in the header of the "ListBMatches" column and choose Extract Values, comma separated.
This option was added in the January 2017 update.
I added "help" to ListB so the first element of ListA has 2 matches that are both returned.