How to move specific rows from an original table to a new table? - sas

In SAS, I have a table that have 1000 rows. I am trying to separate that table into two tables. Row1-500 to Table A and row501-100 to table B. What is the code that can do this function. Thank you all for helping!
I am searching the code online and cannot get anything on google, help is appreciated.

The DATA statement lists the output tables of the step. An OUTPUT statement explicitly sends a row to each of the output tables. An explicit OUTPUT <target> ... <target-k> statement sends records to the specified tables only. The automatic implicit loop index variable _n_ can act as a row counter when a single data set is being read with SET.
Try
data want1 want2;
set have;
if _n_ <= 500 then output want1; else output want2;
run;
However, you may be better served by creating a categorical variable that can be used later in WHERE or BY statements.

Maybe the set options will help.Try firstobs= and obs= to choose the rows you want.
Here is how to use them:
data want1;
set have(obs=500);
run;
data want2;
set have(firstobs=501 obs=1000);
run;

Related

SAS: How to create datasets in loop based on macro variable

I have a macro variable like this:
%let months = 202002 202001 201912 201911 201910;
As one can see, we have 5 months, separated by space ' '.
I would like to create 5 datasets like a_202002, a_202001, a_201912, a_2019_11, a_201910. How can I run this in loop and create 5 datasets, instead of writing the datastep 5 times?
Pseudo code:
for m in &months.
data a_m;
....
....
run;
How can I do that in SAS? I tried %do_over but that did not help me.
Use the knowledge you gained from #Tom answer in an earlier question to create the macro
%macro datasets_for_months ...
...
%mend;
Specify the output data sets in the DATA statement:
DATA %datasets_for_months(...);
...
RUN;
Direct rows to specific output data sets by naming the data set, such as
OUTPUT a_202002;
Note:
A step with no OUTPUT statements will implicitly output to each data set
A step with an OUTPUT statement will cause records to be written to either ALL data sets, or only the ones named in the statement:
OUTPUT writes records to all output data sets
OUTPUT data-set-name-1 writes records to only data sets specified
The DATA Step documentation covers what you need to know in greater detail
DATA Statement
Begins a DATA step and provides names for any output such as SAS data sets, views, or programs.
...
Syntax
Form 1:
DATA statement for creating output data sets
DATA <data-set-name-1 <(data-set-options-1)>>
... <data-set-name-n <(data-set-options-n)>>
... ;
THE ROAD AHEAD
You will likely discover that month will be better served in a conceptual role as a categorical variable in a single large data set, instead of breaking the data into multiple month-named data sets.
A categorical variable will let you leverage the power of SAS' partitioning and segregating statements such as WHERE, BY and CLASS when pursuing processing, reporting and visualization of your data at different combinations of class level values.
How about this approach? Create the data set names in another macro variable and use a single data step.
%let months = 202002 202001 201912 201911 201910;
data _null_;
ds = prxchange('s/(\d+)/a_$1/', -1, "&months.");
call symputx('ds', ds);
run;
options symbolgen;
data &ds.;
run;
You can use a %DO loop and the %SCAN() function. Use the COUNTW() function to find the upper bound.
%do i=1 %to %sysfunc(countw(&months,%str( )));
%let month=%scan(&months,&i,%str( ));
....
%end;

Adding a column is SAS using MODIFY (no sql)

I'm new to SAS and have some problems with adding a column to existing data set in SAS using MODIFY statement (without proc sql).
Let's say I have data like this
id name salary perks
1 John 2000 50
2 Mary 3000 120
What I need to get is a new column with the sum of salary and perks.
I tried to do it this way
data data1;
modify data1;
money=salary+perks;
run;
but apparently it doesn't work.
I would be grateful for any help!
As #Tom mentioned you use SET to access the dataset.
I generally don't recommend programming this way with the same name in set and data statements, especially as you're learning SAS. This is because it's harder to detect errors, since once run and encounter an error, you destroy your original dataset and have to recreate it before you start again.
If you want to work step by step, consider intermediary datasets and then clean up after you're done by using proc datasets to delete any unnecessary intermediary datasets. Use a naming conventions to be able to drop them all at once, i.e. data1, data2, data3 can be referenced as data1-data3 or data:.
data data2;
set data1;
money = salary + perks;
run;
You do now have two datasets but it's easy to drop datasets later on and you can now run your code in sections rather than running all at once.
Here's how you would drop intermediary datasets
proc datasets library=work nodetails holist;
delete data1-data3;
run;quit;
You can't add a column to an existing dataset. You can make a new dataset with the same name.
data data1;
set data1;
money=salary+perks;
run;
SAS will build it as a new physical file (with a temporary name) and when the step finishes without error it deletes the original and renames the new one.
If you want to use a data set you do it like this:
data dataset;
set dataset;
format new_column $12;
new_column = 'xxx';
run;
Or use Proc SQL and ALTER TABLE.
proc sql;
alter table dataset
add new_column char(8) format = $12.
;
quit;

Export/Import Attributes of a SAS dataset

I am working with multiple waves of survey data. I have finished defining formats and labels for the first wave of data.
The second wave of data will be different, but the codes, labels, formats, and variable names will all be the same. I do not want to define all these attributes again...it seems like there should be a way to export the PROC CONTENTS information for one dataset and import it into another dataset. Is there a way to do this?
The closest thing I've found is PROC CPORT but I am totally confused by it and cannot get it to run.
(Just to be clear I'll ask the question another way as well...)
When you run PROC CONTENTS, SAS tells you what format, labels, etc. it is using for each variable in the dataset.
I have a second dataset with the exact same variable names. I would like to use the variable attributes from the first dataset on the variables in the second dataset. Is there any way to do this?
Thanks!
So you have a MODEL dataset and a HAVE dataset, both with data in them. You want to create WANT dataset which has data from HAVE, with attributes of MODEL (formats, labels, and variable lengths). You can do this like:
data WANT ;
if 0 then set MODEL ;
set HAVE ;
run ;
This works because when the DATA step compiles, SAS builds the Program Data Vector (PDV) which defines variable attributes. Even though the SET MODEL never executes (because 0 is not true), all of the variables in MODEL are created in the PDV when the step compiles.
Importantly, note that if there are corresponding variables with different lengths, the length from MODEL will determine the length of the variable in WANT. So if HAVE has a variable that is longer than the same-named variable in MODEL, it may be truncated. Options VARLENCHK determines whether or not SAS throws a warning/error if this happens.
That assumes there are no formats/labels on the HAVE dataset. If there is a variable in HAVE that has a format/label, and the corresponding variable in MODEL does not have a format/label, the format/label from HAVE will be applied to WANT.
Sample code below.
data model;
set sashelp.class;
length FavoriteColor $3;
FavoriteColor="Red";
dob=today();
label
dob='BirthDate'
;
format
dob mmddyy10.
;
run;
data have;
set sashelp.class;
length FavoriteColor $10;
dob=today()-1;
FavoriteColor="Orange";
label
Name="HaveLabel"
dob="HaveLabel"
;
format
Name $1.
dob comma.
;
run;
options varlenchk=warn;
data want;
if 0 then set model;
set have;
run;
I'd create an empty dataset based on the existing one, and then use proc append to append the contents to it.
Create some sample data for the second round of data:
data new_data;
age = 10;
run;
Create an empty dataset based on the original data:
proc sql noprint;
create table want like sashelp.class;
quit;
Append the data into the empty dataset, retaining the details from the original:
proc append base=want data=new_data force nowarn;
run;
Note that I've used the force and nowarn options on proc append. This will ensure the data is appended even if differences are found between the two datasets being used. This is expected if you have, for example, format differences. It will also hide things like if columns exist in the new table that aren't in the old table etc. So be careful that this is doing what you want it to. If the behaviour is undesirable, consider using a datastep to append instead (and list the want dataset first).
Welcome to the stack.
If you want to copy the properties of the table without the data within it, you could use PROC SQL or data step with zero rows read in.
This examples copies all information about the SASHELP.CLASS dataset into a brand new dataset. All formats, attributes, labels, the whole thing is copies over. If you want to only copy some of the columns, specify them in select clause instead of asterix.
PROC SQL outobs=0;
CREATE TABLE WANT as SELECT * FROM SASHELP.CLASS;
QUIT;
Regards,
Vasilij

Probt in sas for column of values

Im looking do a probt for a column of values in sas not just one and to give two tailed p values.
I have the following code Id like to amend
data all_ssr;
x=.551447;
df=25;
p=(1-probt(abs(x),df))*2;
put p=;
run;
however I would like x to be a column of values within another file. I have tried work.ttest which is just a file of ttest values.
Many thanks
You need to use a set statement to access data from another SAS dataset.
data all_ssr;
set work.ttest; /*Dataset containing column of values*/
df=25;
p=(1-probt(abs(x),df))*2;
run;
Removing the put statement avoids clogging up the log.

Keeping only the duplicates

I'm trying to keep only the duplicate results for one column in a table. This is what I have.
proc sql;
create table DUPLICATES as
select Address, count(*) as count
from TEST_TABLE
group by Address
having COUNT gt 1
;
quit;
Is there any easier way to do this or an alternative I didn't think of? It seems goofy that I then have to re-join it with the original table to get my answer.
proc sort data=TEST_TABLE;
by Address;
run;
data DUPLICATES;
set TEST_TABLE;
by Address;
if not (first.Address and last.Address) then output;
run;
Using proc sort with nodupkey and dupout will dedupe the data and give you an "out" dataset with duplicate records from the original dataset, but the "out" dataset does not include EVERY record with the ID variable - it gives you the 2nd, 3rd, 4th...Nth. So you aren't comparing all the duplicate occurrences of the ID variable when you use this method. It's great when you know what you want to remove and define enough by variables to limit this precisely, or if you know that your records with duplicate IDs are identical in every way and you just want them removed.
When there are duplicates in a raw file I receive, I like to compare all records where ID has more than one occurrence.
proc sort data=test nouniquekeys
uniqueout=singles
out=dups;
by=ID;
run;
nouniquekeys deletes unique observations from the "out" DS
uniqueout=dsname stores unique observations
out=dsname stores remaining observations
Again, this method is great for working with messy raw data, and for debugging if your code might have produced duplicates.
That's easy using a data step:
proc sort data=TEST_TABLE nodupkey dupout=dups;
by Address;
run;
Refer to this documentation for further information
select field,count(field) from table
group by field having count(field) > 1