SAS, Trailing with # - sas

I am trying to be able to format the start point of each observation where #2 would be the second character, (#3 the third etc) of each observation rather than from the beginning. For example this data set would be ONE WO REE R
data TEST;
input #1 A $ #2 B $ #3 C $ #4 D $;
cards;
ONE TWO THREE FOUR
;
RUN;

How about
data TEST;
input A $ +1 B $ +2 C $ +3 D $;
cards;
ONE TWO THREE FOUR
;
RUN;

Related

values of commun column in A replaced by that in B with function merge in SAS

I want merge two tables, but they have 2 columns in commun, and i do not want value of var1 in A replaced by that in B, if we don't use drop or rename, does anyone know it?
I can fix it with sql but just curious with Merge!
data a;
infile datalines;
input id1 $ id2 $ var1;
datalines;
1 a 10
1 b 10
2 a 10
2 b 10
;
run;
/* create table B */
data b;
infile datalines;
input id1 $ id2 $ var1 var2;
datalines;
1 a 30 50
2 b 30 50
;
run;
/* Marge A and B */
data c;
merge a (in=N) b(in=M);
if N;
by id1;
run;
but what i like is:
data C;
infile datalines;
input id1 $ id2 $ var1 var2;
datalines;
1 a 10 50
1 b 10 50
2 a 10 50
2 b 10 50
;
run;
Use rename
data c;
merge a (in=N) b(in=M rename=(var1=var1_2));
by id1;
if N;
run;
If you don't want to use rename / drop etc., then you could just flip the merge order such that the datasets whose var1 should be retained overwrites the other:
data c;
merge b (in=M) a(in=N);
by id1;
if N;
run;
When the data step loads data from the datasets mentioned it does it in the order that they appear on the MERGE (or SET or UPDATE) statement. So if you are merging two dataset and the BY variables match values then the record from the first is loaded and the record from the second is loaded, overwriting the values read from the first.
For 1 to 1 matching you can just change the order that the datasets are mentioned.
merge b(in=M) a(in=N) ;
If you really want the variables defined in the output dataset in the order they appear in A then add a SET statement that the compiler will process but that can never execute before your MERGE statement.
if 0 then set a b ;
If you are doing a 1 to many matching then you might have other trouble since when a dataset stops contributing values to the current BY group then SAS does not re-read the last observation. In that case you will have to use some combination of RENAME=, DROP= or KEEP= dataset options.
In PROC SQL when you have duplicate names for selected columns (and are trying to create an output dataset instead of report) then SAS ignores the second copy of the named variable. So in a sense it is the reverse of what happens with the MERGE statement.

Missing String Value Interpolation SAS

I have a series of string values with missing observations. I would like to use flat substitution. For instance variable x has 3 available values. There should be a 33.333% chance that a missing value will be assigned to the available values for x under this substitution method. How would I do this?
DATA have;
INPUT id a $ b $ c $ x;
CARDS;
1 Y Male . 5
2 Y Female . 4
3 . Female Tall 4
4 Y . Short 2
5 N Male Tall 1
;
Run;
You could use temporary arrays to store the possible values. Then generate a random index into the array.
DATA have;
INPUT id a $ b $ c $ x;
CARDS;
1 Y Male . 5
2 Y Female . 4
3 . Female Tall 4
4 Y . Short 2
5 N Male Tall 1
;
data want ;
set have ;
array possible_b (2) $8 ('Male','Female') ;
if missing(b) then b=possible_b(1+int(rand('uniform')*dim(possible_b)));
run;
I did this with generating random numbers and hard coding the limits. There should be an easier way to do this, but for the purposes of the question this should work.
option missing='';
data begin;
input a $;
cards;
a
.
b
c
.
e
.
f
g
h
.
.
j
.
;
run;
data intermediate;
set begin;
if a EQ '' then help= rand("uniform");
else help=.;
run;
data wanted;
set intermediate;
format help populated.;
if a EQ '' then do;
if 0<=help<0.33 then a='V1';
else if 0.33<=help<0.66 then a='V2';
else if 0.66<=help then a='V3';
end;
drop help;
run;

How to identify two rows with same characters at first stream

I want to create a indicator variable, "same_first_two_nearby". That indicates that the first two characters of observations are equal to the nearby observations. I try to use the "duplication method", but I fail. Because the method can only "delete" the duplication but not keep.
PROC SORT data=temp NODUPKEY;
BY customer_IN;
RUN;
The example of my data is as following.
data temp;
input customer_IN $ 1-8 ;
cards;
ADJOHN.
ADMARY.
ADjerry.
BWABBY.
CFLUCY.
CFLINDA.
EFLAGNA.
KTPAKAO.
KTWANDA.
;
run;
proc print data=temp;run;
I want to generate the results as the following.
customer_IN same_first_two_nearby
ADJOHN. 1
ADMARY. 1
ADjerry. 1
BWABBY. 0
CFLUCY. 1
CFLINDA. 1
EFLAGNA. 0
KTPAKAO. 1
KTWANDA. 1
Thanks in advance.
You can do this using a helper column containing the first two characters, provided that it is sorted as per the original question:
data temp;
input customer_IN $ 1-8 initials $ 1-2;
cards;
ADJOHN.
ADMARY.
ADjerry.
BWABBY.
CFLUCY.
CFLINDA.
EFLAGNA.
KTPAKAO.
KTWANDA.
;
run;
data want;
set temp;
by initials;
same_first_two_nearby = not(first.initials and last.initials);
run;

Concatenating 3 variable name in a new variable within a datastep by using an array reference

I would like to create a variable called DATFL that would have the following values for the last obseration :
DATFL
gender/scan
Here is the code :
data mix_ ;
input id $ name $ gender $ scan $;
datalines;
1 jon M F
2 jill F L
3 james F M
4 jonas M M
;
run;
data mix_3; set mix_;
length datfl datfl_ $ 50;
array m4(*) id name gender scan;
retain datfl;
do i=1 to dim(m4);
if index(m4(i) ,'M') then do;
datfl_=vname(m4(i)) ;
if missing(datfl) then datfl=datfl_;
else datfl=strip(datfl)||"/"||datfl_;
end;
end;
run;
Unfortunately, the value I get for 'DATFL' at the last observation is 'gender/scan/gender/scan'.Obviously because of the retain statement that I used for 'DATFL' I ended up with duplicates. At the end of this data step, I was planning to use a CALL SYMPUT statement to load the last value into macro variable but I won't do it until I fix my issue...Can anyone provide me with a guidance on how to prevent 'DATFL' to have duplicates value at the end of the dataset ? Cheers
sas_kappel
Don't retain DATFL, Instead, retain DATFL_.
data mix_3; set mix_;
length datfl datfl_ $ 50;
array m4(*) id name gender scan;
retain datfl_;
do i=1 to dim(m4);
if index(m4(i) ,'M') then do;
datfl_=vname(m4(i)) ;
if missing(datfl) then datfl=datfl_;
else datfl=strip(datfl)||"/"||datfl_;
end;
end;
if missing(datfl) then datfl = datfl_;
run;
It doesn't work...Let me change the dataset (mix_) and you can see that RETAIN DATFLl_, is not working in this scenario.
data mix_ ;
input id $ name $ gender $ scan $;
datalines;
1 jon M M
2 Marc F L
3 james F M
4 jonas H M
;
run;
To resume, what I want is to have the DISTINCT value of DATFL, into a macro variable. The code that I proposed does,for each records,a search for variables having the letter M, if it true then DATFL receives the variable name of the array variable. If there are multiple variable names then they will be separated by '/'. For the next records, do the same, BUT add only variable names satisfying the condition AND the variables that were not already kept in DATFL. Currently, if you run my program I have for DATFL at observation 4, DATFL=gender/scan/name/scan/scan but I would like to have DATFL=gender/scan/name , because those one are the distinct values. Ultimatlly, I will then write the following code;
if eof then CALL SYMPUT('DATFL',datfl);
sas_kappel
Your revised data makes it much clearer what you're looking for. Here is some code that should give the correct result.
I've used the CALL CATX function to add new values to DATFL, separated by a /. It first checks that the relevant variable name doesn't already exist in the string.
data mix_ ;
input id $ name $ gender $ scan $;
datalines;
1 jon M M
2 Marc F L
3 james F M
4 jonas H M
;
run;
data _null_;
set mix_ end=eof;
length datfl $100; /*or whatever*/
retain datfl;
array m4{*} $ id name gender scan;
do i = 1 to dim(m4);
if index(m4{i},'M') and not index(datfl,vname(m4{i})) then call catx('/',datfl,vname(m4{i}));
end;
if eof then call symput('DATFL', datfl);
run;
%put datfl = &DATFL.;

How to read first n number of columns into a SAS dataset?

My raw datafiles is a simple comma delimited file that contains 10 columns. I know there must be a way of importing only 'say 3 columns using missover OR ls ?
Sure, I can import all file and drop uneeded varialbes, however, how can I use
infile missover=3
OR
infile ls=3
?
No, I guess as SAS has to read all the data from a delimited file - it has to read every byte from beginning to end to 1) process the column delimiters, and 2) determine the obs delimiters (CR&LF characters) - DROPing the unneeded columns wouldn't be much of an overhead.
If you need to read only the first n fields you can just list those in your input statement.
If the columns you need are more to the right you will need to list in your input all the columns before those you need and up to the last column you need.
MISSOVER does not take =n and it tells sas to go to the next record and start reading into the first variable of the input statement if it doesn't find enough columns and not look for the last columns in the following record.
e.g
file is like this
a1,b1,c1
a2,b2,c3,d2
data test;
infile "myfile" dlm="," dsd /*2 cosecutive delimiters are not considered as one*/ missover;
input
a $
b $
c $
d $
;
run;
will result in a dataset:
a b c d
a1 b1 c1
a2 b2 c2 d2
data test;
infile "myfile" dlm="," dsd /*2 cosecutive delimiters are not considered as one*/ ;
input
a $
b $
c $
d $
;
run;
will result in a dataset:
a b c d
a1 b1 c1 a2