SAS EG Sum(Case When...) Error - sas

I have been working on a two step process in SAS EG that creates a temporary table (connected to Netezza) then uses that table to build the final summary table. When creating the final table, I am trying to have two calculated columns that represent averages of all sales quotes and only customers that accepted the quote (i.e. the expectation is that the customers who accept have a lower average quote than all inquiries combined). In order to segment customers that accepted the quoted offer versus those that did not, I have attempted to use Sum(Case When...) and Sum with a Boolean operator.
I have the following code with the final three sum statements trying to create the same column in different ways (just looking for one that works). The first two attempts return an error saying that it was unable to identify a function that satisfies the given argument types. The final attempt (which is where I began things) does not recognize the syntax, which I feel is correct. The error occurs after the closed parentheses around "END." Any help would be greatly appreciated:
EXECUTE (
CREATE TABLE SAMPLE1 AS
SELECT DATE
,STATE
,SUM(INQRY_CN) AS INQ_COUNT
,(SUM(BOUND_IN)/INQ_COUNT) AS CLOSURE
,SUM(QUOTE_AMOUNT) AS AVG_QTD_AM
**,SUM(QUOTE_AMOUNT*(DAY_OF_QUOTE <> '01JAN1901')) AS AVG_BND_AM
,SUM(QUOTE_AMOUNT*(BOUND_IN=1)) AS AVG_BND_AM
,SUM(CASE WHEN BOUND_IN=1 THEN QUOTE_AMOUNT
ELSE . END) AS AVG_BND_AM**
FROM TEMP_TABLE
GROUP BY 1,2
ORDER BY 1,2
) BY NETEZZA;
DISCONNECT FROM NETEZZA;
QUIT;

Related

How is it possible for DAX syntax to reference the original table name when using table variables?

This question comes from an example that I'm trying to understand in The Definitive Guide to DAX, Second Edition chapter 4. If you want the sample Power BI file, you can download it from the website above; it's Figure 4-26 in chapter 4. Here is the DAX code:
Correct Average =
VAR CustomersAge =
SUMMARIZE ( -- Existing combinations
Sales, -- that exist in Sales
Sales[CustomerKey], -- of the customer key and
Sales[Customer Age] -- the customer age
)
RETURN
AVERAGEX ( -- Iterate on list of
CustomersAge, -- Customers/age in Sales
Sales[Customer Age] -- and average the customer’s age
)
I understand the logic behind how SUMMARIZE and AVERAGEX are used in this example, and the requirements are all clear. What's confusing to me is how AVERAGEX references Sales[Customer Age]. Since AVERAGEX is operating on the summarized CustomersAge table variable, I would have assumed that the syntax would have been something along the lines of:
AVERAGEX (
CustomersAge,
[Customer Age] -- This is the line that I assumed would be different
)
How is it that the code given in the book is correct? Does the table variable (and the summarized table it contains) somehow have pointers to the original underlying table and column names? And is that normal for writing DAX queries, to always reference the original underlying table and column names when using table variables for intermediate steps?
Yes, the columns have what's known as data lineage. Sometimes you even have to restore lineage if it gets lost. You can read more about it here: https://www.sqlbi.com/articles/understanding-data-lineage-in-dax/
Lars, To the best of my understanding this is how I can explain it.
Creating a variable doesn't create a table that is added to the model. You can think of variables as steps or placeholders of a series of DAX expressions.
And so in the case of the SUMMARIZE used in the CustomerAge variable in this code, you'd see that the actual columns in the model were what was referenced in the arguments of SUMMARIZE. So when you perform calculations on that variable, the columns you can access are the actual columns in the model rather new columns.
What the variable has done is to help you break down the process of writing the calculation and make it less complex.
The code you wrote, as what you expect, would have been valid if in the CustomerAge variable, we created a new column, say Age * 2, and needed to perform the average over that. Then in that case that new column isn't part of the model, thus we'd reference it like you wrote.
I just got my copy of the book but I hope this helps a bit.

Power BI How to Sum Based on If a Column Contains String from Other Column

I have an Entity column with one row per entity. This table has three columns: Entity ID, a Descriptor, and a Metric. The Descriptor is a concatenation of numerous codes and I would like to see the metrics broken down by code.
I originally just split the Descriptor column into numerous rows but that led to some data relationship issues so I'd like to do it without splitting the Descriptor column.
I tried doing the following DAX formula but it resulted in an error stating "the expression contains multiple columns, but only a single column can be used in a True/False expression that is used as a table filter expression"
Desired Output Metric = CALCULATE('Metric',CONTAINSSTRING('Entity Table'[Descriptor],'Code Table'[Code]))
Ultimately I'm not even sure I need this as a column, and it may be better as a measure...
Any help would be appreciated. Thank you!
You can get around "the expression contains multiple columns, but only a single column can be used in a True/False expression that is used as a table filter expression" by using Filter within your CALCULATE.
Here it is as a created column. I used an IF because 'E' code evaluates to a blank and you wanted a 0.
Desired Output Metric = IF(CALCULATE(SUM('Entity Table'[Metric]),FILTER('Entity Table',CONTAINSSTRING('Entity Table'[Descriptor],'Code Table'[Code])))>0,CALCULATE(SUM('Entity Table'[Metric]),FILTER('Entity Table',CONTAINSSTRING('Entity Table'[Descriptor],'Code Table'[Code]))),0)
Here it is as a measure. Be careful to only use this at the Code detail level. When making a measure you need to use aggregate functions to reference your columns, so I am just doing the MIN(Code) since for any single code the Min() will always evaluate to equal that Code. If you try to use this at a higher summary level you may get some odd answers as it will only total for the MIN() code in the data set you are referencing.
Desired Output Metric = IF(CALCULATE(SUM('Entity Table'[Metric]),FILTER('Entity Table',CONTAINSSTRING('Entity Table'[Descriptor],MIN('Code Table'[Code]))))>0,CALCULATE(SUM('Entity Table'[Metric]),FILTER('Entity Table',CONTAINSSTRING('Entity Table'[Descriptor],MIN('Code Table'[Code])))),0)

Cant measure from date column

I'm trying to count days between a date from the column 'completionDate' and today
The table name is 'Incidents (2)'
I have a simuler table called 'incidents' here it's working.
The code:
DaysClosed = DATEDIFF('Incidents (2)'[completionDate].[Dag];TODAY();DAY)
The error i get:
'A single value for variaton 'Dag' for column 'completionDate' in table 'Incidents (2)' cannot be determined. This can happen when a measure formula refers to a column that contains many values without specifying an aggregation such as min, max, count, or sum to get a single result.'
The error you get strongly depends on how you are evaluating your formula, that's why it might work on another table but not on this one. As #JBfreefolks pointed out correctly you are specifying a column where a scalar value is expected. That can work depending on the context you are evaluating your formula over (assuming it is a measure).
For instance, imagine a data-set with 100 rows equally divided into four categories A,B,C,D. When you create a table visual with a row for each category, each row will have 25 underlying records that will be used in any calculation added to this row (either a measure or an aggregate of any value). This means that when using a formula like datediff with a column reference, it will get 25 values for it's second argument where it expects one.
There are several ways to solve the problem depending on your desired result.
Use a measure like MAX like #JBfreefolks suggested to make sure that a single value is selected from multiple values. The measure will still be calculated over a group of records but will summarize it by taking the maximum date.
Make sure the visual you are using has something like an ID in there so it doesn't group, it displays row context. Any measure added to this visual will evaluate in row context as well.
Use a calculated column instead. They are always evaluated in row context first and their values can be aggregated in a visual later on. When using TODAY() , you probably need to refresh your report at least every day to keep the column up to date.
A more complicated way is to use an iterator like SUMX() or AVERAGEX() to force row context evaluation inside of a measure.
Good to see you solved it already, still posting as it might be helpful to others.
'Incidents (2)'[completionDate].[Dag] referencing a colomn. It is in your computation returning a table (multiple date in the evaluation context) instade of a scalar needed in DATEDIFF calculation.
You need to leverage to be sure that 'Incidents (2)'[completionDate].[Dag] is rturning a scalar value. To do that you can leverage rowcontext and then also formula like Max.

ERROR: Expression using IN has components that are of different data types

I am using the below query in SAS Enterprise Guide to find the count for different offer_ids customers for different dates :
PROC SQL;
CREATE TABLE test1 as
select offer_id,
(Count(DISTINCT (case when date between '2016-11-13' and '2016-12-27' then customer_id else 0 end))) as CUSTID
from test
group by offer_id
;QUIT;
ERROR: Expression using IN has components that are of different data types
Note: Here, Offer_id is the character variable whereas Custome_id is an numeric variable.
Most likely the error is caused by comparing the numeric variable DATE to the character strings '2016-11-13'. If you want to specify a date literal in SAS you must specify the date in DATE9 format and append the letter D after the close quote.
date BETWEEN '13NOV2016'd AND '27DEC2016'd
Note that there is no reference to any external database in the posted code. But even if your source table was tdlib.tdtable instead of work.test you still need to use SAS syntax when writing SAS code. Let the Teradata engine figure out how to convert it for you.
You don't make it clear whether this is being run on SAS or Teradata (via pass through).
I'm guessing SAS, in which case you are missing d after your dates (e.g. '2016-11-13'd). Without this, the dates are being treated as text instead of formatted numbers.
The error statement is slightly misleading, as SAS is treating the between statement as an in statement.

BY statement in SAS proc compare

What is the difference in using the ID statement Vs the BY Statement in proc compare.
I understand the ID statement -- that when added observations are compared according to ID..
but what exactly BY statement does..
I did read the SAS documentation and searched net I couldt understand , can anybody elaborate on it .
The way I understand it, the "by" statement makes proc compare do a separate comparison for each by group in the comparison data sets. It's basically like running a separate "proc compare" for each "by" group.
The "id" statement on the other hand correlates records by key between the two data sets to be compared and reports on the number of common elements and how many are in one data sets but not in the other. You would use this if your data sets have a common primary key i.e. a combination of variables that uniquely identify each record, and you want "prooc compare" to take each matching pair and compare them.