Use array variables in subsetting IF without specifying number of array variables - sas

In SAS, I have a few columns of dollar values and several other columns. I don't care about a row of data if it has 0 values for all dollar values and I don't want these rows in the final data. The columns that are dollar values are already in an array Fix_Missing. I am in the data step.
So, for example, I could do:
IF Fix_Missing{1} NE 0 OR Fix_Missing{2} NE 0 OR Fix_Missing{3} NE 0;
However, the number of variables in the array may change and I want code that works no matter how many variables are in it. Is there an easy way to do this?

In most functions that accept uncounted lists of variables, you can use of array[*].
if sum(of fix_missing[*]) > 0
for example would work assuming fix_missing cannot have negative values. You can also use this with CATS, so for example:
if cats(of fix_missing[*]) ne '000';
would work, and you could do something even like this:
if cats(of fix_missing[*]) ne repeat('0',dim(fix_missing)-1);
if you might have unknown numbers of array elements (repeat takes a character and repeats it n+1 times).
Another useful function, probably not useful here but in the same vein, is whichn and whichc (numeric/character versions of same thing). If you wanted to know if ANY of them had a 0 in them:
if whichn(0,of fix_missing[*]) > 0;
which returns the position of the first zero it finds.

Related

Is there any function in SAS where we can read the exact value from the variable

Suppose i have a column called ABC and that variable has the data like
:
123_112233_66778_1122 or
123_112233_1122_11232 or
1122_112233_66778_123
so i want to generate the desire variable in the next column as 1122. like this "1122" i have a long list where i need to cross the value from the column called ABC, if found the exact match then need to generate. However, i don't want to generate the match like 112233 because it does not match the value what i am looking for.
For an example you can see all three line what i have given for reference. I am taking only the match records which is "1122" from all the above 3 lines.
I really have no clue to overcome on the problem. I have tried my hands with wildcards but did not get much success. Any help would be much apricated
It is hard to tell from your description, but from the values you show it looks like you want the INDEXW() function. That will let you search a string for matching words with a option to specify which characters are to be considered as the separators between the words. The result is the location of where the word starts within longer string. When the word is not found the result is a zero.
Let's create a simple example to demonstrate.
data have;
input abc $30. ;
cards;
123_112233_66778_1122
123_112233_1122_11232
1122_112233_66778_123
;
data want;
set have ;
location = indexw(trim(abc),'1122','_');
run;
Note that SAS will consider any value other than zero (or missing) as TRUE so you can just use the INDEXW() function call in a WHERE statement.
data want;
set have;
where indexw(trim(abc),'1122','_');
run;

SAS, converting numbers, from character format to numeric format, keeping all leading zeros, but length of numbers is NOT uniform

I'm working in SAS EG and I'm trying to convert a column that's in character format to numeric format, EXACTLY as they appear in their character format. The numbers vary in length and some have one or two leading zeros.
If I do it one way, it gets rid of all leading zeros. Another way I tried, it adds leading zeros to the point that it's as long as the longest number in the column, e.g., a 9-digit number with one leading zero now has four leading zeros because the longest number in the column is 12 digits. (I hope this description makes sense).
I'm working in SAS EG. When I run proc contents, it tells me my existing variable is a character variable of length 26. It is blank for both 'format' and 'informat.'
I need to convert it so that a new column is a numeric variable, with length 8, and 'F12.' for 'format' and 'BEST12.' for 'informat,' as I plan to use it to match two data sets.
I created the following test data set in 'regular' SAS, but I'm not sure if fully recreates the issue I'm working on in SAS EG:
data have;
input mrn $1-12;
cards;
118283586928
003875807
038087875
0385709873
0038576830
;
run;
As you can see, I have one number that's 12 digits long (no leading zeros); two that are 9 digits (with one or two leading zeros); and two that are 10 digits (with one or two leading zeros).
Any help would be greatly appreciated.
Thanks
You cannot store 26 digit strings exactly as a number in SAS. SAS stores numbers as floating point values. You can use the CONSTANT() function to see the end of the contiguous integers that can be stored exactly.
73 data _null_;
74 x=constant('exactint');
75 put x= comma30.;
76 run;
x=9,007,199,254,740,992
So if you actually have values longer than 15 digits in the character variable you will not be able to convert them to numbers.
But if they are only 12 digits long then just convert the strings into numbers and compare the numbers.
proc sql;
create table want as
select *
from a, b
where a.mrn = input(b.mrn_string,32.)
;
quit;
It's not possible to have different formats in the same column in SAS. The only way to keep them looking exactly as they do while in the same column is to keep them as text. If you need to do calculations on them I'd suggest just creating a 2nd column with their numeric values.
Leading zeros can be added to numbers using the z. format.
https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000205244.htm

Marking strings in one list that exist in another list

I have 2 lists of values in 2 variables, which contain ZIP-codes in string, as they have numbers and letters. My first list contains 33.000 ZIP-codes, the second list 1400. Now I want to check if my ZIP-codes from the second variable are also in the first variable, and if so, give a third variable the code 1. If it is not in both variable lists, give it the code 0. I've tried to compare datasets, but that only compares if the variable is on the same position. Writing a loop didn't work so far.
Hopefully anyone can help! Thanks in advance.
Assuming you have two datasets:
dataset activate list2.
compute InBothLists=1.
sort cases by zipcode.
dataset activate list1.
sort cases by zipcode.
match files /file=* /table=list2 /by zipcode.
execute.
In the code above use your own dataset names and variable names - make sure your have the same variable name for the zipcode in both lists.
Once you run this you will have a new variable in the dataset list1 which has the value 1 for zipcodes that also appear in list2.

Strange sorting in Stata after doing encode

Variable X used to be string. So I used encode command to make it non-string.
But after that when I sort it, it's sorted in this way.
1000
10000
10001
10003
10005
1003
But usually, it should be sorted like
1000
1001
1003
1005
Why is sorting so strange after doing encode?
And it appears 1003 created from encode and 1003 in using dataset are considered different numbers.
Not strange at all. Right near the top of help encode Stata tells you "Do not use encode if varname contains numbers that merely happen to be stored as strings".
encode maps strings in alphabetical (here alphanumeric) order to numeric values 1 up (unless you specify otherwise with a label() option).
So "1000" will sort before "10000" before "1001", and so forth.
You probably need destring but why was the variable read as string? That's what you need to worry about.
encode is for strings when you want a numeric equivalent. So "cat" "dog" "frog" "toad" will map to 1 2 3 4 and the string values will become value labels.
destring is for mistaken strings. The variable should be numeric, but something went wrong on reading the data. So, what was it that went wrong? Common errors include
Header data from a spreadsheet that should be a variable label (or ignored) got read in as data.
Codes for missing data such as NA that make sense to people or to some other program but do not correspond to Stata representations of missing.
Garbage of some kind.
To check for problems, you could look at the values that wouldn't translate to numbers:
tab whatever if missing(real(whatever))

SAS - selecting character observations from position 1 to position 2

I am stuck in this one particular point. I have a character variable with observations extracted from rtf document. I need to keep only the observations from obs A to obs B. The firstobs and obs is not helpful here because we do not know the observation number beforehand. All we know is the two unique strings. For example in the dataset, I need to create a dataset with observations from obs 11 to 16. This is only part of dataset, the original dataset has over 1500 observations, that is why we use unique text to capture instead of observation number.
Thank you all in advance.
You don't explain enough, but odds are you can do something sort of like this if I understand you right (you have a "start" and a "stop" string in the document).
data want;
set have;
retain keep 0;
if strvar = "keepme" then keep=1;
if keep=1;
if strvar = "lastone" then keep=0;
run;
IE, have some condition set the keep variable to 1, then test for it, then have the off condition after that (assuming you want to keep the off condition row). Use string functions like index or find or scan to search for your particular string if it's not an entire string. You could also use regular expressions if necessary.