Brings sets together into one data. (in macro) in SAS - sas

i want to brings sets together into one data in macro. I have 1064 sets from zm_&next_name and i want to brings them into one data for example ----> data CramerSet;
I want to do it in macro

You could do it without a macro. Just use the : operator when defining your data sets.
This feature is nice when you have a constant string that represents the beginning strings of the data set. Even if the strings change, as long as your target strings are constant (like your zm_ data sets) this is a good solution.
data CramerSet;
set zm_:;
run;
Once you run this, check your log. You will see a readout of every zm_% data set that has been concatenated.
If you are in fact hellbent on doing this with a macro - just use the data step above and use the string constant as your macro argument. Then if your string constant changes (maybe you have 1,025 data sets that start with ym_..., just use the new string constant as your macro string.

Related

SAS: adding character variables in data step without setting the lenghth in advance

In a SAS data step, if one creates a character variable he has to be careful in choosing the right length in advance. The following data step returns a wrong result when var1=case2, since 'var2' is truncated to 2 characters and is equal to 'ab', which is obviously not what we want. The same happens replacing var2=' ' with length var2 $2. This kind of procedure is quite prone to errors.
data b; set a;
var2 = ' ';
if var1 = 'case1' then var2='xy';
if var1 = 'case2' then var2='abcdefg';
run;
I was unable to find a way to just define 'var2' as a character, without having to care for its length (side note: if left unspecified, the length is 8).
Do you know if it is possible?
If not, can you perhaps suggest a more robust turnoround, something similar to an sql "case", "decode", etc, to allocate different values to a new string variable that does not suffer from this length issue?
SAS data step code is very flexible compared to most computer languages (and certainly compared to other languages created in the early 1970s) in that you are not forced to define variables before you start using them. The data step compiler waits to define the variable until it needs to. But like any computer program it has rules that it follows. When it cannot tell anything about the variable then it is defined as numeric. If it sees that the variable should be character it bases the decision on the length of the variable on the information available at the first reference. So if the first place you use the variable in your code is assigning it a string constant that is 2 bytes long then the variable has a length of 2. If it is the result of character function where the length is unknown then the default length is 200. If the reference is using a format or informat then the length is set to the appropriate length for the width of the format/informat. If there is no additional information then the length is 8.
You can also use PROC SQL code if you want. In that case the rules of ANSI SQL apply for how variable types are determined.
In your particular example the assignment of blanks to the variable is not needed since all newly created variables are set to missing (all blanks in the case of character variables) when the data step iteration starts. Note that if VAR2 is not new (ie it is already defined in dataset A) then you cannot change its length anyway.
So just replace the assignment statement with a length statement.
data b;
set a;
length var2 $20;
if var1 = 'case1' then var2='ab';
if var1 = 'case2' then var2='abcdefg';
run;
SAS is not going the change the language at this point, they have too many users with existing code bases. Perhaps they will make a new language at some point in the future.

What does this block of SAS code do?

I have sas code that I need to partially convert to c++ code, however I am struggling understand its function. I have no experience with sas, and after a few hours of various tutorials and examples I have made very little progress. I don't have access to any of the input data or any corresponding output either. The code follows the following format, but I've changed the variable names:
data data1;
set data2;
output;
if type='ABCD' and zone=1 then do;
type='BCDE'; spec='CDE'; sub='ABCD DEF'; output;
type='EFGH'; spec='FGH'; output;
type='ABCD'; spec='DEF';
end;
The code then continues on, however I only need to understand the logic of this if statement. In the actual code there are many of these statements but they all follow the same structure, understanding one should help me to understand them all. The variable values are only important insofar as type and uniqueness, if variables here share a value then that is true in the original code as well, otherwise they are different.
I know that the program is designed to take combinations of type/spec/zone and convert them into other type/spec combinations but I can't seem to grasp the logic.
The DATA and SET statements define the target and source, respectively.
The first OUTPUT statement will insure that the target has at least one copy of every record read from the source data.
The code inside the DO END block of the IF/THEN statement will cause two additional records to be written when it runs. They will have different values for the TYPE, SPEC and SUB variables as the assignment statements indicate. At the end of the DO block the values of TYPE, SPEC and SUB will have been set to 'ABCD','DEF' and 'ABCD DEF', respectively.
So if your input is
TYPE,SPEC,SUB,ZONE
ABCD,UNK,UNK,0
ABCD,XX,YY,1
UNK,UNK,UNK,0
The values written by the part of the code you posted would be.
TYPE,SPEC,SUB,ZONE
ABCD,UNK,UNK,0
ABCD,XX,YY,1
BCDE,CDE,ABCD DEF,1
EFGH,FGH,ABCD DEF,1
UNK,UNK,UNK,0

SAS - How to determine the number of variables in a used range?

I imagine what I'm asking is pretty basic, but I'm not entirely certain how to do it in SAS.
Let's say that I have a range of variables, or an array, x1-xn. I want to be able to run a program that uses the number of variables within that range as part of its calculation. But I want to write it in such a way that, if I add variables to that range, it will still function.
Essentially, I want to be able to create a variable that if I have x1-x6, the variable value is '6', but if I have x1-x7, the value is '7'.
I know that :
var1=n(of x1-x6)
will return the number of non-missing numeric variables.. but I want this to work if there are missing values.
I hope I explained that clearly and that it makes sense.
Couple of things.
First off, when you put a range like you did:
x1-x7
That will always evaluate to seven items, whether or not those variables exist. That simply evaluates to
x1 x2 x3 x4 x5 x6 x7
So it's not very interesting to ask how many items are in that, unless you're generating that through a macro (and if you are, you probably can have that macro indicate how many items are in it).
But the range x1--x7 or x: both are more interesting problems, so we'll continue.
The easiest way to do this is, if the variables are all of a single type (but an unknown type), is to create an array, and then use the dim function.
data _null_;
x3='ABC';
array _temp x1-x7;
count = dim(_temp);
put count=;
run;
That doesn't work, though, if there are multiple types (numeric and character) at hand. If there are, then you need to do something more complex.
The next easiest solution is to combine nmiss and n. This works if they're all numeric, or if you're tolerant of the log messages this will create.
data _null_;
x3='ABC';
count = nmiss(of x1-x7) + n(of x1-x7);
put count=;
run;
nmiss is number of missing, plus n is number of nonmissing numeric. Here x3 is counted with the nmiss group.
Unfortunately, there is not a c version of n, or we'd have an easier time with this (combining c and cmiss). You could potentially do this in a macro function, but that would get a bit messy.
Fortunately, there is a third option that is tolerant of character variables: combining countw with catx. Then:
data _null_;
x3='ABC';
x4=' ';
count = countw(catq('dm','|',of x1-x7),'|','q');
put count=;
run;
This will count all variables, numeric or character, with no conversion notes.
What you're doing here is concatenating all of the variables together with a delimiter between, so [x1]|[x2]|[x3]..., and then counting the number of "words" in that string defining word as thing delimited by "|". Even missing values will create something - so .|.|ABC|.|.|.|. will have 7 "words".
The 'm' argument to CATQ tells it to even include missing values (spaces) in the concatenation. The 'q' argument to COUNTW tells it to ignore delimiters inside quotes (which CATQ adds by default).
If you use a version before CATQ is available (sometime in 9.2 it was added I believe), then you can use CATX, but you lose the modifiers, meaning you have more trouble with empty strings and embedded delimiters.

Evaluate string variable that contains macro reference

I've got a table with DataSet names in, some of which contain macro references in the name.
e.g. Monthly_Data_&YYMM (where YYMM is the latest month)
I want to keep the Table with this string, but then have a new variable with the evaluated DataSet name.
e.g. Monthly_Data_&YYMM, Monthly_Data_1612
I can't work out a way to do this. If I read the dataset as a macro variable it returns as the required name, but I can't then join it on the same row as the non evaluated reference.
I'm sure this must be possible, and probably quite easy, but I just can't get my head around how to do this.
Many thanks
You can use the resolve function to do this, e.g.
%let YYMM = 1601;
data mydata;
dsname = 'Monthly_Data_&YYMM';
dsname_resolved = resolve(dsname);
run;
N.B. all macro variables used in your column of names must be defined in your session with the correct values at the point when the resolve function executes. If two different data sets used the same macro variable in their name, but it took different values at different times, you will need to redefine the macro variable and run your logic separately, possibly via separate data steps or call symput + symget.

How do I prevent strings containing the word "yes" or "no" from printing out as true or false? [duplicate]

This question already has answers here:
How can I prevent SerializeJSON from changing Yes/No/True/False strings to boolean?
(7 answers)
Closed 6 years ago.
I'm currently setting a number of variables like so:
<cfset printPage = "YES">
Eventually, when I print these variables out, anything that I try to set to "YES" prints out as "true". Anything set to "NO", prints out as "false". I'm not opposed to using the YesNoFormat function. In fact I might end up using that function in this application, but in the mean time I would like to know if ColdFusion is actually storing the words "YES" and "NO" in memory, or if it is converting them to a boolean format behind the scenes.
If CF is storing my variables exactly the way that I declare them, how would I go about retrieving these variables as strings? If CF is changing the variables in some way, are there any special characters or keywords that I could use to force it to store the variables as strings?
Thank you to everyone that commented / answered. I did a little more experimenting and reading, and it seems that the serializeJSON function will automatically convert "Yes" to "true" and "No" to "false". I either need to deal with this problem in my javascript, or I can add a space in the affected properties to circumvent this behavior.
You already know how to display the boolean value as "yes" or "no" (using YesNoFormat()). I don't think there is a way to force ColdFusion to store a variable a certain way. It just doesn't support that. I guess you could call out the Java type directly by using JavaCast(). I just don't see why you would want to go through that extra work for something like this. You can certainly research that a bit more if you like. Here is a link to the document for JavaCast.
Have a look at this document regarding data types in ColdFusion. I will post some of the relevant points from that document here but please read that page for more information.
ColdFusion is often referred to as typeless because you do not assign types to variables and ColdFusion does not associate a type with the variable name. However, the data that a variable represents does have a type, and the data type affects how ColdFusion evaluates an expression or function argument. ColdFusion can automatically convert many data types into others when it evaluates expressions. For simple data, such as numbers and strings, the data type is unimportant until the variable is used in an expression or as a function argument.
ColdFusion variable data belongs to one of the following type categories:
Simple One value. Can use directly in ColdFusion expressions. Include numbers, strings, Boolean values, and date-time values.
Binary Raw data, such as the contents of a GIF file or an executable program file.
Complex A container for data. Generally represent more than one value. ColdFusion built-in complex data types include arrays, structures, queries, and XML document objects. You cannot use a complex variable, such as an array, directly in a ColdFusion expression, but you can use simple data type elements of a complex variable in an expression. For example, with a one-dimensional array of numbers called myArray, you cannot use the expression myArray * 5. However, you could use an expression myArray[3] * 5 to multiply the third element in the array by five.
Objects Complex constructs. Often encapsulate both data and functional operations.
It goes on to say this regarding Data Types:
Data type notes
Although ColdFusion variables do not have types, it is often convenient to use “variable type” as a shorthand for the type of data that the variable represents.
ColdFusion provides the following functions for identifying the data type of a variable:
IsArray
IsBinary
IsBoolean
IsImage
IsNumericDate
IsObject
IsPDFObject
IsQuery
IsSimpleValue
IsStruct
IsXmlDoc
ColdFusion also includes the following functions for determining whether a string can be represented as or converted to another data type:
IsDate
IsNumeric
IsXML
So in your code you could use something like IsBoolean(printPage) to check if it contains a boolean value. Of course that doesn't mean it is actually stored as a boolean but that ColdFusion can interpret it's value as a boolean.