Implementing a calculated field within my Tableau Viz - if-statement

I have data within tableau that I wish to show a breakdown of USED and FREE storage. However, I need to first filter a specific column to perform 2 different types of calculations. Here is the data
Total Free SKU
10 5 A
20 1 A
5 4 B
2 0 B
10 5 C
10 6 D
I am wanting to show a tableau bar chart that displays the available, used and total within Tableau. However, I need to first filter out by SKU:
I created this calculated field below as well as this calculated field:
Used = Total - Free
IF CONTAINS(ATTR([SKU]),'A') or
CONTAINS(ATTR([SKU]),'D')
THEN SUM([Total])
ELSEIF CONTAINS(ATTR([SKU]),'B') or
CONTAINS(ATTR([SKU]),'C')
THEN AVG([Total])
END
This is what I have done so far, but not sure how to incorporate the calculated field within the viz
Any suggestion is appreciated.

If I understand your problem correctly, proceed like this
Situation-1 You want to work at SKUG level
Create calculation fields each for total/USED/FREE as
SUM(ZN(IF CONTAINS([SKU], 'A') OR CONTAINS([SKU], 'D')
THEN [Total] END))
+
AVG(ZN(IF CONTAINS([SKU], 'B') OR CONTAINS([SKU], 'C')
THEN [Total] END))
Needless to say, please replace [total] by [used] or [free] as applicable
Situation-2 You want to work at higher level of detail instead. In this case you need to decide what you have to do with each of the SKU's group. Let's assume you want to add these. then creating similar fields will do. else replace + in a separate field with your desired operator(!).
Good luck!

Related

How to merger these two records ino one row removing Null value in Informatica using transformation. Please see the snapshot for scenario

enter image description here
Input-
Code value Min Max
A abc 10 null
A abc Null 20
Output-
Code value Min Max
A abc 10 20
You can use an aggregator transformation to remove nulls and get single row. I am providing solution based on your data only.
use an aggregator with below ports -
inout_Code (group by)
inout_value (group by)
in_Min
in_Max
out_Min= MAX(in_Min)
out_Max = MAX(in_Max)
And then attach out_Min, out_Max, code and value to target.
You will get 1 record for a combination of code and value and null values will be gone.
Now, if you have more than 4/5/6/more etc. code,value combinations and some of min, max columns are null and you want multiple records, you need more complex mapping logic. Let me know if this helps. :)

How to countif 56 exists in 156/56/2567 and only return true once? Google sheets

I have one sheet with data on my facebook ads. I have another sheet with data on the products in my store. I'm having trouble with some countifs where I'm counting how many times my product ID exists in a row where multiple numbers are. They are formatted like this: /2032/2034/2040/1/
It's easy on the rows where only one product ID exists but some rows have multiple ID's separated by a /. And I need to see if the ID exists as a exact match alone or somewhere between the /'s.
Rows with facebook ads data:
A1: /2032/2034/2040/1/
A2: /1548/84/2154/2001/
A3: /2032/1689/1840/2548/
Row with product data:
B1: 2034
C1: I need a countifs here that checks how many times B1 exists in column A. Lets say I have thousands of rows with different variations of A1 where B1 could standalone. How do I count this? I always need exact matches.
You can compare the number you want (56) with the REGEX #MonkeyZeus commented whith a little change -> "(?:^|/)"&B1&"(?:/|$)" so the end result is:
=IF(REGEXMATCH(A1, "(?:^|/)"&B1&"(?:/|$)"), true, false)
Example:
UPDATE
If you need to count the total of 56 in X rows you can change the "True / False" of the condition for "1 / 0" and then do a =SUM(C1:C5) on the last row:
=IF(REGEXMATCH(A1, "(?:^|/)"&B1&"(?:/|$)"), 1, 0)
UPDATE 2
Thanks for contributing. Unfortunately I'm not able to do it this way
since I have loads of data to do this on. Is there a way to do it with
a countif in a single cell without adding a extra step with "sum"?
In that case you can do:
=COUNTA(FILTER(A:A, REGEXMATCH(A:A, "(?:^|/)"&B2&"(?:/|$)")))
Example:
UPDATE 3
With the following condition you check every single possibility just by adding another COUNTIF:
=COUNTIF(A:A,B1) + COUNTIF(A:A, "*/"&B1) + COUNTIF(A:A, B1&"/*") + COUNTIF(A:A, "*/"&B1&"/*")
Hope this helps!
try:
=COUNTIF(SPLIT(A1, "/"), B1)
UPDATE:
=ARRAYFORMULA(IF(A2<>"", {
SUM(IF((REGEXMATCH(""&DATA!C:C, ""&A2))*(DATA!B:B="carousel"), 1, )),
SUM(IF((REGEXMATCH(""&DATA!C:C, ""&A2))*(DATA!B:B="imagepost"), 1, ))}, ))

Tableau check for multiple IDs across 2 groups

I want to create a set in tableau, which will show either one of these two values: Y or N
2 already existing columns are here important, "VAT-ID" and "CUSTOMER-ID". the new column should check if a customer-ID has multiple VAT-IDs. If yes, the value "Y" should be displayed, else "N".
The table looks like:
customer-id VAT-id in-both
123456 EE999999999 Y
654321 AA999999999 N
666666 GG999999999 N
123456 KK999999999 Y
654321 AA999999999 N
any Help would be appreciated, I have tried IF [CustomerID] = 1 AND Count([VAT-ID]) > 1 THEN 'Y' ELSE 'N' END which didn't work.
You are close. For this you need an LOD (Level of Detail) expression. LOD expressions allow you to do calculations at a different granularity then the view is rendered.
You can use:
if
{fixed [Customer-Id]: countd([VAT-id]) } > 1
then 'Y'
else 'N'
end
The LOD is the {fixed...}. The way you read this is you want to count the distinct number of VAT-id per each Customer-id. (eg 123456 will return 2; all others will return 1). Then you just wrap that in an If statement.

Generating rolling z-scores of panel data in Stata

I have an unbalanced panel data set (countries and years). For simplicity let's say I have one variable, x, that I am measuring. The panel data sorted first by country (a 3-digit numeric country-code) and then by year. I would like to write a .do file that generates a new variable, z_x, containing the standardized values of the variable x. The variables should be standardized by subtracting the mean from the preceding (exclusive) m time periods, and then dividing by the standard deviation from those same time periods. If this is not possible, return a missing value.
Currently, the code I am using to accomplish this is the following (edited now for clarity)
xtset weocountrycode year
sort weocountrycode year
local win_len = 5 // Defining rolling window length.
quietly: rolling sd_x=r(sd) mean_x=r(mean), window(`win_len') saving(stats_x, replace): sum x
use stats_x, clear
rename end year
save, replace
use all_data_PROCESSED_FINAL.dta, clear
quietly: merge 1:1 (weocountrycode year) using stats_x
replace sd_x = . if `x'[_n-`win_len'+1] == . | weocountrycode[_n-`win_len'+1] != weocountrycode[_n] // This and next line are for deleting values that rolling calculates when I actually want missing values.
replace mean_`x' = . if `x'[_n-`win_len'+1] == . | weocountrycode[_n-`win_len'+1] != weocountrycode[_n]
gen z_`x' = (`x' - mean_`x'[_n-1])/sd_`x'[_n-1] // calculate z-score
UPDATE:
My struggle with rolling is that when rolling is set up to use a window length 5 rolling mean, it automatically does window length 1,2,3,4 means for the first, second, third and fourth entries (when there are not 5 preceding entries available to average out). In fact, it does this in general - if the first non-missing value is on entry 5, it will do a length 1 rolling average on entry 5, length 2 rolling average on entry 6, ..... and then finally start doing length 5 moving averages on entry 9. My issue is that I do not want this, so I would like to avoid performing these calculations. Until now, I have only been able to figure out how to delete them after they are done, which is both inefficient and bothersome.
I tried adding an if clause to the -rolling- statement:
quietly: rolling sd_x=r(sd) mean_x=r(mean) if x[_n-`win_len'+1] != . & weocountrycode[_n-`win_len'+1] != weocountrycode[_n], window(`win_len') saving(stats_x, replace): sum x
But it did not fix the problem and the output is "weird" in the sense that
1) If `win_len' is equal to, say, 10, there are 15 missing values in the resulting z_x variable, instead of 9.
2) Even though there are "extra" missing values in z_x, the observations still start out as window length 1 means, then window length 2 means, etc. which makes no sense to me.
Which leads me to believe I fundamentally don't understand 1) what -rolling- is doing and 2) how an if clause works in the context of -rolling-.
Does this help?
Thanks!
I'm not sure I understand completely but I'll try to answer based on what I think your problem is, and based on a comment by #NickCox.
You say:
... when rolling is set up to use a window length 5 rolling mean...
if the first non-missing value is
on entry 5, it will do a length 1 rolling average on entry 5, length 2
rolling average on entry 6, ...
This is expected. help rolling states:
The window size refers to calendar periods, not the number of
observations. If there
are missing data (for example, because of weekends), the actual number of observations used by command may be less than
window(#).
It's not actually doing a "length 1 rolling average", but I get to that later.
Below some examples to see what rolling does:
clear all
set more off
*-------------------------- example data -----------------------------
set obs 92
gen dat = _n - 1
format dat %tq
egen seq = fill(1 1 1 1 2 2 2 2)
tsset dat
tempfile main
save "`main'"
list in 1/12, separator(4)
*------------------- Example 1. None missing ------------------------
rolling mean=r(mean), window(4) stepsize(4) clear: summarize seq, detail
list in 1/12, separator(0)
*------- Example 2. All but one value, missing in first window ------
use "`main'", clear
replace seq = . in 1/3
list in 1/8
rolling mean=r(mean), window(4) stepsize(4) clear: summarize seq, detail
list in 1/12, separator(0)
*------------- Example 3. All missing in first window --------------
use "`main'", clear
replace seq = . in 1/4
list in 1/8
rolling mean=r(mean), window(4) stepsize(4) clear: summarize seq, detail
list in 1/12, separator(0)
Note I use the stepsize option to make things much easier to follow. Because the date variable is in quarters, I set windowsize(4) and stepsize(4) so rolling is just computing averages by year. I hope that's easy to see.
Example 1 does as expected. No problem here.
Example 2 on the other hand, should be more interesting for you. We've said that what matters are calendar periods, so the mean is computed for the whole year (four quarters), even though it contains missings. There are three missings and one non-missing. summarize is computing the mean over the whole year, but summarize ignores missings, so it just outputs the mean of non-missings, which in this case is just one value.
Example 3 has missings for all four quarters of the year. Therefore, summarize outputs . (missing).
Your problem, as I understand it, is that when you face a situation like Example 2, you'd like the output to be missing. This is where I think Nick Cox's advice comes in. You could try something like:
rolling mean=r(mean) N=r(N), window(4) stepsize(4) clear: summarize seq, detail
replace mean = . if N != 4
list in 1/12, separator(0)
This says: if the number of non-missings for the window (r(N), also computed by summarize), is not the same as the window size, then replace it with missing.

Extracting dollar amounts from existing sql data?

I have a field with that contains a mix of descriptions and dollar amounts. With TSQL, I would like to extract those dollar amounts, then insert them into a new field for the record.
-- UPDATE --
Some data samples could be:
Used knife set for sale $200.00 or best offer.
$4,500 Persian rug for sale.
Today only, $100 rebate.
Five items for sale: $20 Motorola phone car charger, $150 PS2, $50.00 3 foot high shelf.
In the set above I was thinking of just grabbing the first occurrence of the dollar figure... that is the simplest.
I'm not trying to remove the amounts from the original text, just get their value, and add them to a new field.
The amounts could/could not contain decimals, and commas.
I'm sure PATINDEX won't cut it and I don't need an extremely RegEx function to accomplish this.
However, looking at The OLE Regex Find (Execute) function here, appears to be the most robust, however when trying to use the function I get the following error message in SSMS:
SQL Server blocked access to procedure 'sys.sp_OACreate' of component
'Ole Automation Procedures' because this component is turned off as
part of the security configuration for this server. A system
administrator can enable the use of 'Ole Automation Procedures' by
using sp_configure. For more information about enabling 'Ole
Automation Procedures', see "Surface Area Configuration" in SQL Server
Books Online.
I don't want to go and changing my server settings just for this function. I have another regex function that works just fine without changes.
I can't imagine this being that complicated to just extract dollar amounts. Any simpler ways?
Thanks.
CREATE FUNCTION dbo.fnGetAmounts(#str nvarchar(max))
RETURNS TABLE
AS
RETURN
(
-- generate all possible starting positions ( 1 to len(#str))
WITH StartingPositions AS
(
SELECT 1 AS Position
UNION ALL
SELECT Position+1
FROM StartingPositions
WHERE Position <= LEN(#str)
)
-- generate possible lengths
, Lengths AS
(
SELECT 1 AS [Length]
UNION ALL
SELECT [Length]+1
FROM Lengths
WHERE [Length] <= 15
)
-- a Cartesian product between StartingPositions and Lengths
-- if the substring is numeric then get it
,PossibleCombinations AS
(
SELECT CASE
WHEN ISNUMERIC(substring(#str,sp.Position,l.Length)) = 1
THEN substring(#str,sp.Position,l.Length)
ELSE null END as Number
,sp.Position
,l.Length
FROM StartingPositions sp, Lengths l
WHERE sp.Position <= LEN(#str)
)
-- get only the numbers that start with Dollar Sign,
-- group by starting position and take the maximum value
-- (ie, from $, $2, $20, $200 etc)
SELECT MAX(convert(money, Number)) as Amount
FROM PossibleCombinations
WHERE Number like '$%'
GROUP BY Position
)
GO
declare #str nvarchar(max) = 'Used knife set for sale $200.00 or best offer.
$4,500 Persian rug for sale.
Today only, $100 rebate.
Five items for sale: $20 Motorola phone car charger, $150 PS2, $50.00 3 foot high shelf.'
SELECT *
FROM dbo.fnGetAmounts(#str)
OPTION(MAXRECURSION 32767) -- max recursion option is required in the select that uses this function
This link should help.
http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/extracting-numbers-with-sql-server
Assuming you are OK with extracting the numeric's, regardless of wether or not there is a $ sign. If that is a strict requirement, some mods will be needed.