Subsetting records - sas

I have a table that looks like this:
Country
Austria
Belgium
Germany
Italy
....
I'm trying to subset some records (Austria and Belgium) of the variable Country in this way but it does not work:
DATA mydata;
SET mydata;
KEEP Austria Belgium;
RUN;
I used this piece of code from here https://stats.idre.ucla.edu/sas/modules/ubsetting-data-in-sas/ where some examples of subsetting data are reported. Unfortunately it does not work.
Can anyone help me please? I'm new in SAS.
Kind regards

What you are looking for is a sub-setting if
DATA mydata;
SET mydata;
if country in ('Austria', 'Belgium');
RUN;
You have to check if the variables equals those values. What you have in your code is telling SAS to keep columns Austria and Belguim in the output data. Those are values, not columns.

Related

enter column in a dataset to an array

I have 33 different datasets with one column and all share the same column name/variable name;
net_worth
I want to load the values into arrays and use them in a datastep. But the array that I use should depend on the the by groups in the datastep (country by city). There are total of 33 datasets and 33 groups (country by city). each dataset correspond to exactly one by group.
here is an example what the by groups look like in the dataset: customers
UK 105 (other fields)
UK 102 (other fields)
US 291 (other fields)
US 292 (other fields)
Could I get some advice on how to go about and enter the columns in arrays and then use them in a datastep. or do you suggest to do it in another way?
%let var1 = uk105
%let var2 = uk102
.....
&let var33 = jk12
data want;
set customers;
by country city;
if _n_ = 1 then do;
*set datasets and create and populate arrays*;
* use array values in calculations with fields from dataset customers, depending on which by group. if the by group is uk and city is 105 then i need to use the created array corresponding to that by group;
It is a little hard to understand what you want.
It sounds like you have one dataset name CUSTOMERS that has all of the main variables and a bunch of single variable datasets that the values of NET_WORTH for a lot of different things (Countries?).
Assuming that the observations in all of the datasets are in the same order then I think you are asking for how to generate a data step like this:
data want;
set customers;
set uk105 (rename=(net_worth=uk105));
set uk103 (rename=(net_worth=uk103));
....
run;
Which might just be easiest to do using a data step.
filename code temp;
data _null_;
input name $32. ;
file code ;
put ' set ' name '(rename=(net_worth=' name '));' ;
cards;
uk105
uk102
;;;;
data want;
set customers;
%include code / source2;
run;

Compare each row of one dataset with another dataset

Just a general question lets say I have two datasets called dataset1 and dataset2 and If I want to compare the rows of dataset1 with the complete dataset2 so essentially compare each row of dataset1 with dataset2. Below is just an example of the two datasets
Dataset1
EmployeeID Name Employeer
12345 John Microsoft
1234567 Alice SAS
1234565 Jim IBM
Dataset1
EmployeeID2 Name DateAbsent
12345 John 25/06/2009
12345 John 26/06/2009
1234567 Alice 27/06/2010
1234567 Alice 30/06/2011
1234567 Alice 2/8/2012
12345 John 28/06/2009
12345 John 25/07/2009
12345 John 25/08/2009
1234565 Jim 26/08/2009
1234565 Jim 27/08/2010
1234565 Jim 28/08/2011
1234565 Jim 29/08/2012
I have written some programming logic its not sas code, this is just my logic
for item in dataset1:
for item2 in dataset2:
if item.EmployeeID=item2.EmployeeID2 and item.Name=item2.Name then output newSet
This is an inner join.
proc sql noprint;
create table output as
select a.EmployeeId,
a.Name,
a.Employeer,
b.DateAbsent
from dataset1 as a
inner join
dataset2 as b
on a.EmployeeID = b.EmployeeID2
and a.Name = b.name;
quit;
I recommend reading the SAS documentation on PROC SQL if you are unfamiliar with the syntax
To do this in a Data step, the data sets need to be sorted by the variables to join on (or indexed). Also the variable names need to be the same, so I will assume both variables are EmployeeID.
/*sort*/
proc sort data=dataset1;
by EmployeeID Name;
run;
proc sort data=dataset2;
by EmployeeID Name;
run;
data output;
merge dataset1 (in=ds1) dataset2 (inds2);
by EmployeeID Name;
if ds1 and ds2;
run;
The data step does the loop for you. It needs sorted sets because it only takes 1 pass over the data sets. The if clause checks to make sure you are getting a value from both data sets.
Is your goal to compare the two dataset and see where there are differences? Proc Compare will do this for you. You can compare specific columns or the entire dataset.

SAS: PROC MEANS Grouping in Class Variable

I have the following sample data and 'proc means' command.
data have;
input measure country $;
datalines;
250 UK
800 Ireland
500 Finland
250 Slovakia
3888 Slovenia
34 Portugal
44 Netherlands
4666 Austria
run;
PROC PRINT data=have; RUN;
The following PROC MEANS command prints out a listing for each country above. How can I group some of those countries (i.e. UK & Ireland, Slovakia/SLovenia as Central Europe) in the PROC MEANS step, rather than adding another datastep to add a 'case when' etc?
proc means data=have sum maxdec=2 order=freq STACKODS;
var measure;
class country;
run;
Thanks for any help at all on this. I understand there are various things you can do in the PROC MEANS command itself (like limit the number of countries by doing this:
proc means data=have(WHERE=(country not in ('Finland', 'UK')
I'd like to do the grouping in the PROC MEANS command for brevity.
Thanks.
This is very easy with a format for any PROC that takes a CLASS statement.
Simply build a format, either with code or from data; then apply the format in the PROC MEANS statement.
proc format lib=work;
value $countrygroup
"UK"="British Isles"
"Ireland"="British Isles"
"Slovakia","Slovenia"="Central Europe"
;
quit;
proc means data=have;
class country;
var measure;
format country $countrygroup.;
run;
It's usually better to have numeric codes for country and then format those to be whichever set of names is needed at any one time, particularly as capitalization/etc. is pretty irritating, but this works well enough even here.
The CNTLIN= option in PROC FORMAT allows you to make a format from a dataset, with FMTNAME as the value statement, START as the value-to-label, LABEL as the label. (END=end of range if numeric.) There are other options also, the documentation goes into more detail.

Transposing one column in a dataset but by year and another column

I have this dataset here which looks like this:
Basically I want to manipulate the data set so that I have
GVKEY1 as unique such as 1004 then a unique year number such as 1996 then several gvkey2 after that. However the number of gvkey2 for each year is not the same. Does anyone know how to get around this problem? This means I will have several 12 lines of data for gvkey1 for 1004 since i have years from 1996 to 2008. Then for each year I will have many columns where each column will have a gvkey2.
Best Regards,
Naz
Can you not just use PROC TRANSPOSE?
proc sort data=your_data_set out=temp1;
by gvkey1 year;
run;
proc transpose data=temp1 out=temp2;
by gvkey1 year;
var gvkey2;
run;
This will give you a series of variables COL1 - COLx. Use the PREFIX option for different variable names.
I'm not sure I've understood your question, but if you're looking for unique gvkey1/year pairs, you could do either of these:
proc sql;
create table results as
select distinct gvkey1, year
from _your_data_set;
quit;
or
proc sort data=_your_data_set(keep=gvkey1 year) out=results nodupkey;
by gvkey1 year;
run;
If that's not what you're looking for, I suggest posting an example of the results you want.

SAS: rearrange field order in data step

In SAS 9, how can I in a simple data step, rearrange the order the field.
Data set2;
/*Something probably goes here*/
set set1;
run;
So if set1 has the following fields:
Name Title Salary
A Chief 40000
B Chief 45000
Then I can change the field order of set2 to:
Title Salary Name
Chief 40000 A
Chief 45000 B
Thanks,
Dan
Some quick googling gave me this method:
data set2;
retain title salary name;
set set1;
run;
from here:
http://analytics.ncsu.edu/sesug/2002/PS12.pdf
If you have a very large number of variables in your dataset sometimes it is easier to use an sql statement instead of a datastep. This allows you to list just the variables whose order you care about and use a wildcard to retain everything else.
proc sql noprint;
create table set2 as
select title, salary, *
from set1;
quit;
If you are doing this with a large table you can save yourself the IO overhead by creating a view instead. This can be applied to both the data set approach or the proc sql approach.
proc sql noprint;
create view set2 as
select title, *
from set1;
quit;
** OR;
data set2 / view=set2;
retain title salary name;
set set1;
run;
Cheers
Rob
You can also use an informat statement to do this - there is no need to specify any informats. I suspect this is slightly more efficient than an equivalent retain statement, as it allows SAS to initialise values to missing rather than retrieving them from the previous row. In practice the difference is minimal, and you also have the option of using a view.
data set2;
informat title salary name;
set set1;
run;
The variables specified in the informat statement are moved to the left of the dataset and into that order, and the rest are left as they were in the input dataset.
You can use anything that initializes the PDV with variables in the order you want (ATTRIB, ARRAY, FORMAT, INFORMAT, LENGTH, RETAIN).
Source: This SAS note: http://support.sas.com/kb/8/395.html