Combining Field Values in SAS - sas

Suppose my variables are X and Y
The values of X are A, B, C
The values of Y are 1, 2, 3
There is also an ID field. The X and Y values are unique to each ID.
Using SAS, I want to create another variable that combines the values of X and Y with a period in between: A.1, B.2, C.3
How should I go about this? Thanks!

CATX function (or any CAT family function, really) will do it for you.
newfield = catx('.',x,y);

Related

SAS Studio: Finding Values of Column 2 in Column 1 Until Column 2 is Specific Value

I have a simple question that I can't seem to answer. I HAVE a large data set where I am searching for values of column 2 that are found in column 1, until column 2 is a specific value. Sounds like a DO loop but I don't have much experience using them. Please see image as this likely will explain better.
Essentially, I have a "starting" point (with the first_match flag=1). Then, I want to grab the value of column 2 in this row (B in this example). Next, I want to search for this value (B) in column 1. Once I find that row (with column 1 = B & column 2 = C), I again grab the value in column 2 (C). Again, I find where in column 1 this new value occurs and obtain the corresponding value of column 2. I repeat this process until column 2 has a value of Z. That's my stopping point. The WANT table shows my desired output.
My apologies if the above is confusing, but it seems like a simple exercise that I can't seem to solve. Any help would be greatly appreciated. Glad to supply further clarification as well.
Have & Want
I have tried PROC SQL to create flags and grab the appropriate rows, but the code is extremely bulky and doesn't seem efficient. Also, the example I laid out has a desired output table with 3 rows. This may not be the case as the desired output could contain between 1 and 10 rows.
This question has been asked and answered previously.
Path traversal can be done using a DATA Step hash object.
Example:
data have;
length vertex1 vertex2 $8;
input vertex1 vertex2;
datalines;
A B
X B
D B
E B
B C
Q C
C Z
Z X
;
data want(keep=vertex1 vertex2 crumb);
length vertex1 vertex2 $8 crumb $1;
declare hash edges ();
edges.defineKey('vertex1');
edges.defineData('vertex2', 'crumb');
edges.defineDone();
crumb = ' ';
do while (not last_edge);
set have end=last_edge;
edges.add();
end;
trailhead = 'A';
vertex1 = trailhead;
do while (0 = edges.find());
if not missing(crumb) then leave;
output;
edges.replace(key:vertex1, data:vertex2, data:'*');
vertex1 = vertex2;
end;
if not missing(crumb) then output;
stop;
run;
All paths in the data can be discovered with an additional outer loop iterating (HITER) over a hash of the vertex1 values.

Replacing observations with a previous set observation

I have three columns. One identifies the observations by F. The other column orders each observation within the same F, called T. The third column is a numerical value, called Q. I'd like all my values for Q greater than a certain value of T to be replaced by the values at a fixed T, within the same F. For example, I'd like all values of Q within the same F that have T > 6 to be equal to whatever value Q has for that F has for T = 6. If an F has a Q value of 40 at T=6 and a Q value of 50 at T=7, I want that Q at T=7 to say 40 as well.
This might not be the correct way of solving this, but it did the trick. If anyone has a better solution, please help me out.
xtset F T
gen Q_fixed = Q
replace Q_fixed = . if T > 6
replace Q_fixed = L.Q_fixed if Q_fixed == .

How do you write set membership in SAS?

Suppose we want to check whether a variable is not in the set of numbers {1,2,3}. How would you do this in SAS? Here is the code:
data test;
if x ^= {1,2,3} then x = NA;
run;
if not (X in (1,2,3)) then call missing(x);
You can do this a few other ways but this is typically most common.

Differentiating between missing and total in output of proc means?

I've got something like the following:
proc means data = ... missing;
class 1 2 3 4 5;
var a b;
output sum=;
run;
This does what I want it to do, except for the fact that it is very difficult to differentiate between a missing value that represents a total, and a missing value that represents a missing value. For example, the following would appear in my output:
1 2 3 4 5 type sumA sumB
. . . . . 0 num num
. . . . . 1 num num
Ways I can think of handling this:
1) Change missings to a cardinal value prior to proc means. This is definitely doable...but not exactly clean.
2) Format the missings to something else prior, and then use preloadfmt? This is a bit of a pain...I'd really rather not.
3) Somehow use the proc means-generated variable type to determine whether the missing is a missing or a total
4) Other??
I feel like this is clearly a common enough problem that there must be a clean, easy way, and I just don't know what it is.
Option 3, for sure . Type is simply a binary number with 1 for each class variable, in order, that is included in the current row and 0 for each one that is missing. You can use the CHARTYPE option to ask for it to be given explicitly as a string ('01101110' etc.), or work with math if that's more your style.
How exactly you use this depends on what you're trying to accomplish. Rows that have a missing value on them will have a type that suggests a class variable should exist, but doesn't. So for example:
data want;
set have; *post-proc means assuming used CHARTYPE option;
array classvars a b c d e; *whatever they are;
hasmissing=0;
do _t = 1 to dim(classvars);
if char(_type_,_t) = 1 and classvars[_t] = . then hasmissing=1;
end;
*or;
if cmiss(of classvars[*]) = countc(_type_,'0') then hasmissing=0;
else hasmissing=1; *number of 0s = number of missings = total row, otherwise not;
run;
That's a brute force application, of course. You may also be able to identify it based on the number of missings, if you have a small number of types requested. For example, let's say you have 3 class variables (so 0 to 7 values for type), and you only asked for the 3 way combination (7, '111') and the 3 two way combination 'totals' (6,5,3, ie, '110','101','011'). Then:
data want;
set have;
if (_type_=7 and cmiss(of a b c) = 0) or (cmiss(of a b c) = 1) then ... ; *either base row or total row, no missings;
else ... ; *has at least one missing;
run;
Depending on your data, NMISS may also work. That checks to see if the number of missings is appropriate for the type of data.
Joe's strategy, modified slightly for my exact problem, because it may be useful to somebody at some point in the future.
data want;
set have;
array classvars a b c d e;
do _t = 1 to dim(classvars);
if char(_type_,_t) = 1 and (strip(classvars[_t] = "") or strip(classvars[_t]) = ".") then classvars[_t] = "TOTAL";
end;
run;
The rationale for the changes is as follows:
1) I'm working with (mostly) character variables, not numeric.
2) I'm not interested in whether a row has any missing or not, as those are very frequent, and I want to keep them. Instead, I just want the output to differentiate between the missings and the totals, which I have accomplished by renaming the instances of non-missing to something that indicates total.

Assigning list elements to variables

The output from Mathematica with the following operation FactorInteger[28851680048402838857] is as follows:
{{3897424303, 1}, {7402755719, 1}}
My question is: how could I go about extracting the two prime numbers (without the exponents) and assign them to an arbitrary variable?
I basically want to retrieve two primes, whatever they may be, and assign them some variables.
Ex: x0 = 3897424303 and x1 = 7402755719
Thanks!
The output is a list and you can use list manipulating functions like Part ([[ ]]) to pick the pieces you want, e.g.,
{x0, x1} = FactorInteger[28851680048402838857][[All, 1]]
or, without Part:
{{x0,dummy}, {x1,dummy}} = FactorInteger[28851680048402838857];
Implicit in your question is the issue of handing parts of the expression that is returned as output from functions such as FactorInteger. Allow me to suggest alternatives.
1. Keep all of the values in a {list} and access each element with Part:
x = First /# FactorInteger[7813426]
{2, 31, 126023}
x[[1]]
x[[3]]
2
126023
2. Store factors as values of the function x, mimicking indexation of an array:
(This code uses MapIndexed, Function.)
Clear[x]
MapIndexed[
(x[First##2] = First##1) &,
FactorInteger[7813426]
];
x[1]
x[3]
2
126023
You can see all the values using ? or ?? (see Information):
?x
Global`x
x[1]=2
x[2]=31
x[3]=126023