I have to convert some SAS code. In other programming languages I am used to < being used in comparisons e.g. in pseudo-code: If x < y then z
In SAS, what is the < operator achieving here:
intck(month,startdate,enddate)-(day(enddate)<day(startdate))
I have been able to understand the functions using the reference documentation but I can't see anything relating to how '<' is being used here.
Just to go into a little more detail about what the code you have there is doing, it's an old school method to determine the number of months from one date to the next (possibly to calculate a birthday, for example).
Originally, SAS functions intck and intnx only calculated the number of "firsts of the month" in between two dates (or similar for other intervals). So INTCK('month','31OCT2020'd, '01NOV2020'd) = 1, while INTCK('month','01OCT2020'd,'30NOV2020'd) = 1. Not ideal! So you'd add in this particular bit of code, -(day(enddate)<day(startdate)), which says "if it is not been a full month yet, subtract one". It's equivalent to this:
if day(enddate) < day(startdate) then diff = intck(month,startdate,enddate) - 1;
else diff = intck(month,startdate,enddate);
There's now a better way to do this (yay!). intck and 'intnx' are a bit different, but it's the same idea. For intck the argument is method, where c for "continuous" is what you want to compare same period in the month. For intnx it is the alignment option, where 's' means "same" (so, move to the same point in the month).
So your code now should be:
intck(month,startdate,enddate,'c')
The symbol < is an operator in that expression. It is not a function call , like INTNX() is in your expression.
SAS evaluates boolean expressions (like the less than test in your example) to 1 for TRUE and 0 for FALSE.
So your expression is subtracting 1 when the day of month of ENDDATE is smaller than the day of month of STARTDATE.
Note: You can also do the reverse, treat a number as a boolean expression. For example in a statement like:
if (BASELINE) then PERCENT_CHANGE = (VALUE-BASELINE) / BASELINE ;
A missing value or a value of zero in BASELINE will be treated as FALSE and so in those cases the assignment statement does not run.
Related
Im new to c++, I'm reading the core guidelines and I came across this:
P.1: Express ideas directly in code
In this, it says to use something like Month month() const; instead of int month();
So I have 2 questions, why is there a const at the end of the function and what does that do? And how is Month defined? Can you declare new functions with any name instead of things like int, double, float etc?
Thanks in advance
The point of the guideline is what it says: to put the ideas behind the interface in the code itself.
If you have some date type, and it has a member function int month();, that expresses the idea that you can retrieve a month from the date. And presumably, the returned int value represents the month of the year as stored within that date.
But... how does it represent the month? It's an integer. Is the month of January given the integer 0, as might be expected by a long-time programmer (of most languages)? Or maybe it is given the integer 1, because that's how dates are actually written? The code itself says nothing about this; you would have to look at the documentation for the function to find out. So if you see if(date.month() == 1), you don't know if the code is asking if it is January or February.
The alternative, Month month() const; returns the type Month. Now, what that type is is not specified. Whatever it is, that type can make it unambiguous what value is represented by "January". For example:
enum class Month { January, February, ...};
How is "January" encoded? By the enum value Month::January. Therefore, if you see the test if(data.month() == Month::January), then it is clear exactly what it is saying. It is not only unambiguous, it also reads almost exactly like English: is the month of the date the month of January.
This is what it means to express an idea in the code: the entirety of the idea is unambiguously expressed by the code. Make the code say what it is doing, not-so-much how it is doing it.
I am new to powerbi and trying to understand difference between CALCULATETABLE and CALCULATE. I read these 1,2 pages. I am not clear. It says that Whereas the CALCULATE function requires as its first argument an expression that returns a single value, the CALCULATETABLE function takes a table of values. Could anyone explain it further?
update 1
1) it seems that CALCULATETABLE works on a table of data while CALCULATE works on a single value. is there any difference in terms of output?
2) could anyone guide when it is better to use one of these functions over other?
These are the most fundamental function in DAX. Thy are used when you need to change the context where the expression (first parameter of the function) is evaluated.
The difference between the two functions is related to the input type and the output type.
CALCULATE function takes as input an expression that evaluates to scalar and returns a scalar value.
CALCULATETABLE function takes as input an expression that evaluates to table and returns a table.
Therefore, if you need to change the context where a scalar expression is evaluated, use CALCULATE. If you need to change the context where a table expression is evaluated, use CALCULATETABLE.
Expressions
An expression that evaluates to scalar is everything that returns you a scalar. For example, SUM(), MIN(), MAX() they all returns you a single value.
An expression that evaluates to table is everything that returns you a table. For example, 'My Table'[My Field] and VALUES('My Table'[My Field]) both returns you a table.
Sources
Finally, how can you know input type and output type of dax functions? My favourite source is dax.guide
In a SAS data step, if one creates a character variable he has to be careful in choosing the right length in advance. The following data step returns a wrong result when var1=case2, since 'var2' is truncated to 2 characters and is equal to 'ab', which is obviously not what we want. The same happens replacing var2=' ' with length var2 $2. This kind of procedure is quite prone to errors.
data b; set a;
var2 = ' ';
if var1 = 'case1' then var2='xy';
if var1 = 'case2' then var2='abcdefg';
run;
I was unable to find a way to just define 'var2' as a character, without having to care for its length (side note: if left unspecified, the length is 8).
Do you know if it is possible?
If not, can you perhaps suggest a more robust turnoround, something similar to an sql "case", "decode", etc, to allocate different values to a new string variable that does not suffer from this length issue?
SAS data step code is very flexible compared to most computer languages (and certainly compared to other languages created in the early 1970s) in that you are not forced to define variables before you start using them. The data step compiler waits to define the variable until it needs to. But like any computer program it has rules that it follows. When it cannot tell anything about the variable then it is defined as numeric. If it sees that the variable should be character it bases the decision on the length of the variable on the information available at the first reference. So if the first place you use the variable in your code is assigning it a string constant that is 2 bytes long then the variable has a length of 2. If it is the result of character function where the length is unknown then the default length is 200. If the reference is using a format or informat then the length is set to the appropriate length for the width of the format/informat. If there is no additional information then the length is 8.
You can also use PROC SQL code if you want. In that case the rules of ANSI SQL apply for how variable types are determined.
In your particular example the assignment of blanks to the variable is not needed since all newly created variables are set to missing (all blanks in the case of character variables) when the data step iteration starts. Note that if VAR2 is not new (ie it is already defined in dataset A) then you cannot change its length anyway.
So just replace the assignment statement with a length statement.
data b;
set a;
length var2 $20;
if var1 = 'case1' then var2='ab';
if var1 = 'case2' then var2='abcdefg';
run;
SAS is not going the change the language at this point, they have too many users with existing code bases. Perhaps they will make a new language at some point in the future.
For a dataset in SSRS reports, I've created a calculated field with below expression and it works fine.
=(Fields!SalesAmount.Value+Fields!TaxAmt.Value)*Fields!Factor.Value
But sometime we can have factor value as 0. So, I modified above expression with below:
=iif(Fields!Factor.Value = 0,(Fields!SalesAmount.Value+Fields!TaxAmt.Value)*1,(Fields!SalesAmount.Value+Fields!TaxAmt.Value)*Fields!Factor.Value)
But this throws below exception:
The Value expression for the textrun ‘Textbox2.Paragraphs[0].TextRuns[0]’ uses an aggregate function on data of varying data types. Aggregate functions other than First, Last, Previous, Count, and CountDistinct can only aggregate data of a single data type.
Can someone please help to get resolve this issue?
This doesn't seem to be an issue with the IIF, but the datatypes seem to be mismatched. I have a few suggestions to try. First, you could try to multiply the true value by 1.0 as I assume the datatypes are decimal or double. The other option would be to use something like CInt to covert the value to the correct datatype. CInt would likely be inappropriate for decimals as it would round off any decimal values, but the idea is the same. Since your expression doesn't use any aggregates and only uses normal operators, it must relate to something other than the IIF.
Here's a useful link to break down the datatypes and conversion methods. Choose the most appropriate conversion and apply it to the 1.
EDIT: The expression should be the following.
=iif(Fields!Factor.Value = 0.00,
(Fields!SalesAmount.Value+Fields!TaxAmt.Value)*CDbl(1),
(Fields!SalesAmount.Value+Fields!TaxAmt.Value)*Fields!Factor.Value)
I imagine what I'm asking is pretty basic, but I'm not entirely certain how to do it in SAS.
Let's say that I have a range of variables, or an array, x1-xn. I want to be able to run a program that uses the number of variables within that range as part of its calculation. But I want to write it in such a way that, if I add variables to that range, it will still function.
Essentially, I want to be able to create a variable that if I have x1-x6, the variable value is '6', but if I have x1-x7, the value is '7'.
I know that :
var1=n(of x1-x6)
will return the number of non-missing numeric variables.. but I want this to work if there are missing values.
I hope I explained that clearly and that it makes sense.
Couple of things.
First off, when you put a range like you did:
x1-x7
That will always evaluate to seven items, whether or not those variables exist. That simply evaluates to
x1 x2 x3 x4 x5 x6 x7
So it's not very interesting to ask how many items are in that, unless you're generating that through a macro (and if you are, you probably can have that macro indicate how many items are in it).
But the range x1--x7 or x: both are more interesting problems, so we'll continue.
The easiest way to do this is, if the variables are all of a single type (but an unknown type), is to create an array, and then use the dim function.
data _null_;
x3='ABC';
array _temp x1-x7;
count = dim(_temp);
put count=;
run;
That doesn't work, though, if there are multiple types (numeric and character) at hand. If there are, then you need to do something more complex.
The next easiest solution is to combine nmiss and n. This works if they're all numeric, or if you're tolerant of the log messages this will create.
data _null_;
x3='ABC';
count = nmiss(of x1-x7) + n(of x1-x7);
put count=;
run;
nmiss is number of missing, plus n is number of nonmissing numeric. Here x3 is counted with the nmiss group.
Unfortunately, there is not a c version of n, or we'd have an easier time with this (combining c and cmiss). You could potentially do this in a macro function, but that would get a bit messy.
Fortunately, there is a third option that is tolerant of character variables: combining countw with catx. Then:
data _null_;
x3='ABC';
x4=' ';
count = countw(catq('dm','|',of x1-x7),'|','q');
put count=;
run;
This will count all variables, numeric or character, with no conversion notes.
What you're doing here is concatenating all of the variables together with a delimiter between, so [x1]|[x2]|[x3]..., and then counting the number of "words" in that string defining word as thing delimited by "|". Even missing values will create something - so .|.|ABC|.|.|.|. will have 7 "words".
The 'm' argument to CATQ tells it to even include missing values (spaces) in the concatenation. The 'q' argument to COUNTW tells it to ignore delimiters inside quotes (which CATQ adds by default).
If you use a version before CATQ is available (sometime in 9.2 it was added I believe), then you can use CATX, but you lose the modifiers, meaning you have more trouble with empty strings and embedded delimiters.