How to use Calculated Join in Tableau using 1-M relationship and regex_extract - regex

Question 1 - Is Tableau able to use multiple results from from a single line in a REGEXP using the global variable to compare against another table during a Join operation? If no, question 2 is null. If yes...
Question 2 - I'm attempting to join two data sources in Tableau using a regexp in a calculated join because the left table has 1 value in each cell (ie. 64826) and the right table has 4 possible matches in each cell (ie. 00000|00000|21678|64826).
The problem is that my regex stops looking after it finds 1 match (the first of 4 values), and the global variable /g has the opposite effect I expected and eliminates all matches.
I've tried calculated joins on the Data Source tab. I've also tried separating those 4 values into their own columns in worksheets using
regexp_extract_nth. In both cases, regex stops looking after the first result. A Left Join seems to work somewhat, while an Outer Join returns nothing.
REGEXP_EXTRACT([Event Number],'(\d{5})')
REGEXP_EXTRACT_NTH([Event Number],'(?!0{5})(\d{5})',1)
With these examples, regex would match a NULL with the left table even though 64826 is in the right table. I expect the calculated join to return all possible matches from the right set, so there'd be a match on 21678 and on 64826, duplicating rows in the right table like so...
21678 - 00000|00000|21678|64826
64826 - 00000|00000|21678|64826
45245 - 45106|45245|00000|00000
45106 - 45106|45245|00000|00000

Your original expression is just fine, we might want to make sure that we are sending a right command in Tableau, which I'm not so sure, maybe let's try an expression similar to:
\b([^0]....)\b
even just for testing, then maybe let's modify our commands to:
REGEXP_EXTRACT([Event Number], '\b([^0]....)\b')
or:
REGEXP_EXTRACT_NTH([Event Number], '\b([^0]....)\b', 1)
to see what happens. I'm assuming that the desired numbers won't be starting with 0.
Please see the demo here
Reference

Related

Google Sheets - filter list excluding values from other list

I have a big sheet raw_data which is automatically populated by a script every 5 minutes. As such I cannot add new columns with formulas but have to solve problems in single formulas.
The challenge:
I need to pull out a list of unique values from a column O. At the same time, I need to filter out a certain set of values in range A55:A
I have this formula to pull out the unique values:
=SORT(UNIQUE(raw_data!O2:O))
I tried playing with match, but how do I "inverse" the result from the match as I'm actually looking to
exclude rather than include:
=SORT(UNIQUE(FILTER(raw_data!O2:O,IFERROR((Match(raw_data!O2:O,A75:A200,0))))))
I tried adding a NOT() around the Match() but that then gave me a no results error.
Anyone?
Instead of using NOT use ISNA
The above because MATCH returns #N/A when there is no match.
Related
Return FALSE for #N/A in if match statement
Filter out all of user's entries if one of them was selected

Use multiple replace conditions for a single column in Amazon Redshift

I have a table where the amount column has , and $ sign for example: $8,122.14 as values. I want to write a replace function to replace $ and , over that column in one go. Is there any way we can write multiple conditions in one replace in Redshift? Also, this is apart of post processing the data where I am inserting data from stage table to a final table after replacing these values.
I tried the ways listed in the take 1 and 2 given in the code but both of them failed.
Take 1:
insert into db.stage_table
select
(coalesce(replace(logging_amount,'$',','),''))) as logging_amount
from db.table;
Take 2:
insert into db.stage_table
select
(coalesce(replace(logging_amount,'$',',')) as logging_amount
from db.table;
Both of them failed.
The expected result should be replace function in a single statement.
Yes you can nest replace statements like this
replace(replace(logging_amount,'$',''),',','')
Or you can use regex if you prefer (personally for something like this i think nested replaces are easier to read.)

How do I apply a formula to a range without applying said formula to every cell?

I'm trying to apply a formula without having it add the formula data to each and every cell - in other words, I need the cells that are receiving the formula to be untouched until they get their data.
I was searching around and it looked like an ARRAYFORMULA would work but it doesn't seem to be doing anything when I apply it.
For example, I want to apply this formula to a cell range: =SPLIT(E2, ",")). Each cell in the E column needs to be split into two the two adjacent cells next to it based on it's comma. When I try to apply =ARRAYFORMULA(SPLIT(E2:E99, ",")) only the cell I add this to gets the formula.
In addition to the contribution of pnuts, also try:
=ArrayFormula(iferror(REGEXEXTRACT(","&E2:E,"^"&REPT(",+[^,]+",COLUMN(OFFSET(A1,,,1,6))-1)&",+([^,]+)")))
Note: the last parameter of OFFSET can be changed to match the maximum number of values you have in the cells of the range E2:E (separated by a comma). E.g: if you have a no more than 3 values per cell, set it to three. The output will then be three columns wide (one column for each value).
Hope that makes sense ?
Also credits due to AdamL who (I believe) orginally crafted this workaround.
I think what you want may be array_constrain but for your example I can only at present offer you two formulae (one for each side of the comma):
=Array_constrain(arrayformula(left(E2:E,find(",",E2:E)-1)),match("xxx",E:E)-1,1)
=Array_constrain(arrayformula(mid(E2:E,find(",",E2:E)+1,len(E2:E))),match("xxx",E:E)-1,1)

Conditional Vlook up without using VBA

I want to convert an input to desired output. Kindly help.
In the output - the columns value should start from most recent (year)
Please click this to see data
Unfortunately VLOOKUP is not able to fulfill that ask. However the INDEX-function can.
Here is a good read on how to use it:
http://fiveminutelessons.com/learn-microsoft-excel/use-index-lookup-multiple-values-list
This will work for you spreedsheet, if your input table starts at A1 without a header and your output table starts at H3 with the first ID.
You get this by copy&pasting the first column of your input table to column H and then remove duplicates.
{=IF(ISERROR(INDEX($A$1:$C$7,SMALL(IF($A$1:$A$7=$H$3,ROW($A$1:$A$7)),ROW(1:1)),3)),"",
INDEX($A$1:$C$7;SMALL(IF($A$1:$A$7=$H$3,ROW($A$1:$A$7)),ROW(1:1)),3))}
Let's look at the formula step by step:
The curly brackets tell excel that this is an array formula, the interesting part for you is: when you've inserted the formula (without curly brackets) press shift+ctrl+enter, excel will then know that this is an array formula.
'error at formula?, then blank, else formula
=IF(ISERROR(....),"",...)
When you autofill this formula you probably dont know how many instances of your lookup variable are. So when you put this formula in 4 cells, but there are only 3 entries, this bit will keep the cell blank instead of giving an error.
INDEX($A$1:$C$7,SMALL(IF($A$1:$A$7=$H$3,ROW($A$1:$A$7)),ROW(1:1)),3))
$A$1:$C$7 is your data matrix. Your IDs (in your case 125 and 501) are to be found in $A$1:$A$7. ROW(1:1) is the absolute(!) rowID, 3 the absolute(!) column id. So when you move your input table those values have to be changed.
What exactly SMALL and INDEX do are well described in the link above. (Or at least better than I could.)
Hope that clarified some parts,
Tom

Optimize join with regex

I have one table (A) with a phrase, and the other (B) is a phrase that I want to find WITHIN table A's phrase. So I'm joining them as follows:
Create table C as
SELECT A.*
FROM A
JOIN B
where (A.phrase LIKE concat("%",B.phrase,"%"));
It is taking a long time because it's only using one reducer, and I believe this has to do with the nature of the query? Is there a way of speeding this up? I don't think a mapjoin or bucketjoin would help, because I'm not equating two columns, but rather, searching within one table for words from another table...
I found the solution.
The problem was that Hive doesn't do non equi joins well. So I did equi joins to get a subset of table A before I did the non equi join regex. So, 3 steps.
Break A.phrase and B.phrase into individual words.
Equate these words to see which keywords from B.phrase are equal to any keywords from A.phrase - this gives a subset of table A where A.phrase contains at least one keyword from B.phrase.
Use this table A subset to find the whole "%B.phrase%".
I think that EXISTS may be faster simply because your query will return same row from A multiple times for every match:
SELECT
A.*
FROM A as a
WHERE EXISTS (
SELECT
1
FROM B
WHERE a.phrase LIKE concat("%",phrase,"%")
);