Fuzzy match on google sheets - regex

I'm trying to fuzzy match two columns in google sheets, i've tried numerous formulas but I think it's going to come down to a script to help out.
I have a column with product ID's e.g.
E20067
and then I have another sheet with another column which has image url's relating to this product code such as
http://wholesale.test.com/product/E20067/web_images/E20067.jpg
http://wholesale.test.com/product/E20067/high_res/E20067.jpg
http://wholesale.test.com/product/E20067/high_res/E20067-2.jpg
What I'm wanting to do is "fuzzy" match both of these columns for their product ID, and then create a new column for each match. So it would have the product ID then on the same row in multiple columns each product image URL - like the image below:
Is there a way to do this in google sheets using a script or a formula?

In Google sheets there are a few powerful 'regex' formulas.
Suppose, you have ID list in column A, and URL list in column B
Then use formula:
=REGEXEXTRACT(B1,JOIN("|",$A$1:$A$3))
It will match one of ID's. Drag the formula down to see the result as in picture above.
See more info here

Old thread but, in case you find yourself here, search for my Google Sheets add-on called Flookup. It should do exactly what you want.
For this case, you can use this function:
Flookup (lookupValue, tableArray, lookupCol, indexNum, threshold, [rank], [range])
The parameter details are:
lookupValue: the value you're looking up
tableArray: the table you want to search
lookupCol: the column you want to search
indexNum: the column you want data to be returned from
threshold: the percentage similarity below which data shouldn't be returned
rank: the nth best match (i.e. if the first one isn't to your liking)
range: choose to return the percentage similarity or row number for each match
You can find out more at the official website (examples and such).
Please note that, whereas the OP appears to want the whole list of possible matches, Flookup will only return one result at a time.
Flookup can now return a list of all possible matches through its LRM mode.

Try the following. I am assuming the product codes are in Sheet1 and the URLs are in Sheet2. Both in column A:
=iferror(transpose(FILTER(Sheet2!$A$2:$A,Search("*"& A2 &"*",Sheet2!$A$2:$A))))
Copy down.
If you want to show the image instead of the url try:
=arrayformula(image(iferror(transpose(FILTER(Sheet2!$A$2:$A,Search("*"& A2 &"*",Sheet2!$A$2:$A))))))

Related

if and search function on power BI token literal expected

I have a table called "food suppliers" with a column called "sites" that have multiple countries ending in ".com", ".es", ".co.uk".
I want to create a new column that separates these sites into their corresponding country names using the if and search function on power query.
so far in power query custom column I have:
Country = IF (SEARCH ("*.com", foodsuppliers[sites],,0) = 0, IF (SEARCH ("*.es", foodsuppliers[sites],, 0)= 0, "Spain","UK"),"USA")
But I am getting a "token literal expected" under the first = sign in "IF (SEARCH ("*.com", foodsuppliers[sites],,0) = 0"
does any one have ideas why or a better way to run this code on power query/power bi?
thanks.
Nothing in your code seems relevant
There is no SEARCH function, you want to use Text.Contains
You cant write foodsuppliers[sites] or you will be getting the entire column of all rows. You probably want each current row, which you would get with [sites]
This is not excel where you can do =if (xxx,fff,zzz) the format is if x then y else z
I recommend some tutorials

How to filter based on the URL of an image in Google Sheets?

I'm trying to create a filter view in Google Sheets that will only show certain rows of the spreadsheet based on the last few characters of the URL of the images that are inserted in every row. For example, most rows have an image that is simply named "image1.png", "image2.png", "image3.png", etc, but every once in a while there'll be a row where the image is named "image63_s.png", "image176_s.png", "image271_s.png", etc. What I'd like to do is create a filter view that will only show rows where the name of the image in the URL ends with "_s".
EDIT: The images are inserted into the sheet with the formula =IMAGE("https://www.example.com/site/image1.png"), so I don't think regex can work here.
use custom formula:
=REGEXMATCH(A1, ".*_s.png$")
update:
=REGEXMATCH(FORMULATEXT(A1), ".*_s.png.*")
or as suggested have a hidden helper column of urls

UNIQUE formula in Google Sheets for multiple ranges

I have a list of participants in column A. A full employee list in column B. I want to get the list of non-participants in column C. Basically 'B-A' but in list form.
'January' is the participants list:
try:
=FILTER(A:A; NOT(COUNTIF(B:B; A:A)))
It is always an added challenge to write formulas when we don't have access to actual date. But based on what I can see, try this formula in the top cell of any empty column:
=ArrayFormula({"My Header"; FILTER(R2:R,ISERROR(VLOOKUP(TRIM(R2:R),TRIM(T2:T),1,FALSE)))})
You can change "My Header" to something meaningful.
The next part means "FILTER in anything in the range R2:R that cannot be found [i.e., ISERROR(VLOOKUP(...))] in T2:T."
TRIM is used just to account for any accidental/stray spaces that may occur in either list, since that would result in no match if one or the other had extra space.
If this does not do what you expect, please share a link to a sample spreadsheet.

How to get the reference of a cell based on two filters in Google Sheets?

I am parsing JSON data from a FB Ads Campaign through Graph API in Google Sheets. I have multiple sheets for different ad insights based on a timeframe (today, yesterday, 7 days, 30 days) and a dashboard that shows a snapshot of the most important data like the # of conversions and cost-per-conversion for each campaign.
On the dashboard page, I want to match the Adset ID with the value next to cells that contain 'complete_registration' on an insight page.
This is the current formula I have
=INDIRECT(INDEX("h"&filter(ROW('Todays Insights'!G1:G901),'Todays Insights'!G1:G901="complete_registration")),1)
This works for referencing the first time 'complete registration' is used... but I want the value for each Adset
For example -> IF Column A has Adset_ID and Column B has Complete registration then index value in C
What formula would accomplish this in Google Sheets?
Picture Example
O.K. I think I get it. You want to get 2 coulmns as a result - B and column H when C is registration_complete.
Try:
=filter({'Todays Insights'!a1:a,'Todays Insights'!h1:h},'Todays Insights'!c1:c="complete_registration")
Here is the formula I used that works:
=DGET('Sheet Name'!$A$1:$N$500,"Actions Value",{"Adset ID","Actions Action Type";C24,"lead"})

String intersection (find video with matching tags) in PostgreSQL

I have a Postgres database with videos that have tags. The tags are stored alphabetically in a semicolon delimited list. I want to be able to query the database with a list of tags and return the video with the highest match.
I've looked at using regexp_match, ~, and others.
The best I've come up with thus far is a mediocre heuristic that searches for tags with regex i.e.
SELECT FROM videos WHERE tags ~ 'kitten.*laser'
Bonus (imo) that this will also match tags like fat-kitten or big-laser
but the problem here is that if a video is missing one of those tags than I won't get that in my result, and if a user picks too many tags than they won't see any videos. To remedy that I started iterating for the number of videos I want and popping off the less relevant tags, but that's probabilistic at best and a disaster at worst.
What I'm looking for is some kind of Postgres query where I can pass in a regex and find the results from videos with the largest intersection.
For example, let's pretend we're querying from the following data:
cat;disaster;mouse
kitten;mouse;piano
cat;mouse;keyboard
An optimal query for the tags cat, mouse, keyboard would return rows in the following order
cat;mouse;keyboard
cat;disaster;mouse
kitten;mouse;piano
because the 1st row contains 3 matches, the next row contains 2 matches, and the last row contains one match.
To find the rows with the tags, you can use Postgres' array handling which might be more efficient that regular expressions.
select *
from tag
where string_to_array(tags, ';') && array['cat', 'mouse', 'keyboard'];
the && means overlaps - if the left and the right hand side have at least one element in common the row will be returned. Unfortunately there is no "intersect" operator for arrays which would give you the ability to rank the results. It will however not match fat-cat
The above could be improved by creating a GiST index on the tags column because the && operator can use such an index (but GiST indexes are more expensive to build than regular B-Tree indexes)
Tags are a classic many-many thing. Would it be possible to move your tags into their own table? You'd also need a join table that has the links between tags and videos. Apologies if you by passed this approach for a reason, but I thought I'd throw it out there since it's fairly well traveled.
Assuming table:
create table tag (tags text);
insert into tag values
('cat;disaster;mouse'),
('kitten;mouse;piano'),
('cat;mouse;keyboard');
given query sorts results according to tag matches count:
select
tags
from tag
order by
(select
sum(case t.tag in ('cat', 'mouse', 'keyboard') when true then 1 else 0 end) as match
from regexp_split_to_table(tags, ';') as t(tag) )
desc;
Unluckily
Bonus (imo) that this will also match tags like fat-kitten or
big-laser
bonus had gone, but it's also possible to rewrite query a bit to achieve it.