Extract 2 Columns and Attach as Rows in SAS - sas

I have two datasets in SAS. The first looks like this (let's say it is called data 1 (I'm only concerned with two columns of it)
...and the second dataset (let's say it is called data 2) looks like this:
...and I am trying to extract the second column of the first dataset and insert it into the second dataset, to achieve something that looks like this:
Basic Problem Description:
I am trying to extract two columns from a dataset in SAS and add them as rows to a second dataset. The variable names in the first dataset are in a column of their own (entitled 'variable name') and in the second dataset each variable is a column header (a variable in itself) with corresponding data. The images I provided are overly simplistic, as the actual data itself is very long.
Basically, I am trying to find functions in SAS which allow me to do this.
What I have tried
-I have tried to extract the first two columns as a table using proc sql, converted them to a data frame using a data step, sorted them, then used proc transpose to try to convert them from long to wide, then tried to use some sort of append function to tack them on to the second dataset, but append did not work.
-I have tried to merge the two sets, but the merge does not seem to work after using proc transpose.
-I have also tried transposing the second dataset and then merging them, which worked (for some reason) but then I was not able to transpose the data back (so that I can analyze it, which is my purpose in doing all of this).
What functions would I use to go about this process?
Apologies for not providing replicable data, I am more searching for recommendations for functions rather than a detailed hard solution.

To force PROC TRANSPOSE to use a variable as the source for the new variable names use the ID statement. So if you have this first dataset:
data tall;
input fruit $ count ##;
cards;
APPLE 1 PEACH 2 PEAR 2
;
You can use this code to convert it.
proc transpose data=tall out=wide;
id fruit;
var count;
run;
Then if you have another dataset that already has the variables APPLE, PEACH, PEAR etc then just set the two together.
data want;
set wide have ;
run;

Related

Generating many tables from a single table in SAS

I have a table in SAS which contains the format information I want. I want to bin this data into the categories given.
What I don't know how to do is create either an xform or a format file from the data.
An example table looks like this:
TxtLabel Type FmtName label Hlo count
. I FAC1f 0 O 1
1996 I FAC1f 1 2
1997 I FAC1f 2 3
I want to date all years in a different data set as after 1997 OR before 1996.
The problem is that I know how to do this by hard coding it, but these files changes the numbers each time so I'm hoping to use the information in the table to generate the bins rather than hard code them.
How do I go about binning by data using a column from another dataset for my categorization?
Edit
I have two data sets, one which looks like the one I have included and one which has a column titled "YEAR". I want to bin the second data set using the categories from the first. In this case there are two available years in TxtLabel. There are multiple tables like this, I'm looking at how to generate PROC Format code from the table, rather than hard coding the values.
This should run to create the desired format
Proc FORMAT CNTLIN=MyCustomFormatControlData;
run;
You can then use it in a DATA Step, or apply it to a column in a data set.
Binning the data might be construed as 'data set splitting' but your question does not make it clear if that is so. Generic arbitrary splitting is often done with one of these techniques:
wall paper source code resolved from macro variables populated from information garnered in a Proc SQL or Proc FREQ step
dynamic data splitting using hash object for grouping records in memory, and saved to a data set with an .output() call.
Sample code for explicit binning
data want0 want1 want2 want3 want4 want5 wantOther;
set have;
* explicit wall paper;
select (put(year,FAC1f.));
when ('0') output want0;
when ('1') output want1;
when ('2') output want2;
when ('3') output want3;
when ('4') output want4;
when ('5') output want5;
otherwise output wantOther;
run;
This is the construct that source code generated by macro can produce, and requires
one pass to determine the when/output lines that are to be generated
a second pass to apply the lines of code that were generated.
If this is the data processing that you are attempting:
do some research (plenty of info out there)
write some code
make a new question if you get errors you can't resolve
Proc FORMAT
Proc FORMAT has a CNTLIN option for specifying a data set containing the format information. The structure and values expected of the Input Control Data Set (that CNTLIN) is described in the Output Control Data Set documentation. Some of the important control data columns are:
FMTNAME
specifies a character variable whose value is the format or informat name.
LABEL
specifies a character variable whose value is associated with a format or an informat.
START
specifies a character variable that gives the range's starting value.
END
specifies a character variable that gives the range's ending value.
As the requirements of the custom format to be created get more sophisticated you will need to have more information variables in the input control data set.

sas proc surveyselect with contstraint on column

i'm trying to create a unique sample dataset with proc surveyselect based on 2 columns.
I have a simple table with person_id and household_id. in this case person_id is my "primary key" which is the main input for creating a sample. BUT i need to make sure that I don't mix household_id between sample and base data.
So if household_id = 123 is the sample, it is not allowed to appear in the base data (even with another person_id) and vice versa.
do you have a handy idea? all my solution pre- or postprocessing will influence the sample sizes.
Thanks!!
E.
As you have found, proc surveyselect does not allow for this sort of constraint. You're going to have to accept a slight distortion in your sampling if you want to accommodate it. My suggestion would be to proceed as follows:
Use proc surveyselect to create a random sample
Identify all household_ids in the sample dataset that are also present in the base dataset. Let's say there are N of these.
Create another sample of size N from the base dataset with all of the household_ids in the original sample excluded.
Put all of the matching household_id rows back in the base dataset, remove them from the original sample, and append the new sample to the original sample.

Export/Import Attributes of a SAS dataset

I am working with multiple waves of survey data. I have finished defining formats and labels for the first wave of data.
The second wave of data will be different, but the codes, labels, formats, and variable names will all be the same. I do not want to define all these attributes again...it seems like there should be a way to export the PROC CONTENTS information for one dataset and import it into another dataset. Is there a way to do this?
The closest thing I've found is PROC CPORT but I am totally confused by it and cannot get it to run.
(Just to be clear I'll ask the question another way as well...)
When you run PROC CONTENTS, SAS tells you what format, labels, etc. it is using for each variable in the dataset.
I have a second dataset with the exact same variable names. I would like to use the variable attributes from the first dataset on the variables in the second dataset. Is there any way to do this?
Thanks!
So you have a MODEL dataset and a HAVE dataset, both with data in them. You want to create WANT dataset which has data from HAVE, with attributes of MODEL (formats, labels, and variable lengths). You can do this like:
data WANT ;
if 0 then set MODEL ;
set HAVE ;
run ;
This works because when the DATA step compiles, SAS builds the Program Data Vector (PDV) which defines variable attributes. Even though the SET MODEL never executes (because 0 is not true), all of the variables in MODEL are created in the PDV when the step compiles.
Importantly, note that if there are corresponding variables with different lengths, the length from MODEL will determine the length of the variable in WANT. So if HAVE has a variable that is longer than the same-named variable in MODEL, it may be truncated. Options VARLENCHK determines whether or not SAS throws a warning/error if this happens.
That assumes there are no formats/labels on the HAVE dataset. If there is a variable in HAVE that has a format/label, and the corresponding variable in MODEL does not have a format/label, the format/label from HAVE will be applied to WANT.
Sample code below.
data model;
set sashelp.class;
length FavoriteColor $3;
FavoriteColor="Red";
dob=today();
label
dob='BirthDate'
;
format
dob mmddyy10.
;
run;
data have;
set sashelp.class;
length FavoriteColor $10;
dob=today()-1;
FavoriteColor="Orange";
label
Name="HaveLabel"
dob="HaveLabel"
;
format
Name $1.
dob comma.
;
run;
options varlenchk=warn;
data want;
if 0 then set model;
set have;
run;
I'd create an empty dataset based on the existing one, and then use proc append to append the contents to it.
Create some sample data for the second round of data:
data new_data;
age = 10;
run;
Create an empty dataset based on the original data:
proc sql noprint;
create table want like sashelp.class;
quit;
Append the data into the empty dataset, retaining the details from the original:
proc append base=want data=new_data force nowarn;
run;
Note that I've used the force and nowarn options on proc append. This will ensure the data is appended even if differences are found between the two datasets being used. This is expected if you have, for example, format differences. It will also hide things like if columns exist in the new table that aren't in the old table etc. So be careful that this is doing what you want it to. If the behaviour is undesirable, consider using a datastep to append instead (and list the want dataset first).
Welcome to the stack.
If you want to copy the properties of the table without the data within it, you could use PROC SQL or data step with zero rows read in.
This examples copies all information about the SASHELP.CLASS dataset into a brand new dataset. All formats, attributes, labels, the whole thing is copies over. If you want to only copy some of the columns, specify them in select clause instead of asterix.
PROC SQL outobs=0;
CREATE TABLE WANT as SELECT * FROM SASHELP.CLASS;
QUIT;
Regards,
Vasilij

Probt in sas for column of values

Im looking do a probt for a column of values in sas not just one and to give two tailed p values.
I have the following code Id like to amend
data all_ssr;
x=.551447;
df=25;
p=(1-probt(abs(x),df))*2;
put p=;
run;
however I would like x to be a column of values within another file. I have tried work.ttest which is just a file of ttest values.
Many thanks
You need to use a set statement to access data from another SAS dataset.
data all_ssr;
set work.ttest; /*Dataset containing column of values*/
df=25;
p=(1-probt(abs(x),df))*2;
run;
Removing the put statement avoids clogging up the log.

SAS drop cloned observation

I have a dataset structured with an ID and two other variables.
The id is not unique, it appears in the dataset more than 1 time (a patient could receive more than one clinical treatment).How can I drop the entire observation (the entire line) only if it is a perfect clone of a previous observation (based on the other two variable values)? I don't want to use an insanely long if statement.
Thanks.
proc sql;
select distinct * from olddata;
quit;
Sounds like an easy SQL fix. The select distinct option will remove any completely duplicate rows in a dataset if you select all columns.
If you specifically want to identify if two consecutive lines are identical (but are not looking to match identical lines separated by other lines), you can use notsorted on a by statement and then first and last variables.
data want;
set have;
by id var1 var2 notsorted;
if first.var2;
run;
That will keep the first record for any group of identical id/var1/var2, so long as they're consecutive on the dataset. Of course if you sort the dataset by id var1 var2 first this will always remove the duplicates, but not sorted this still works for removing consecutive pairs (or more) that are collocated.
I prefer #JJFord's answer, but for the sake of completeness, this can also be done using the nodupe option in proc sort:
proc sort data=mydata nodupe;
by id;
run;
What you choose as the by variable doesn't really matter here. The important bit is just to specify the nodupe option.