Tracking ID in SAS - sas

I have a SAS question. I have a dataset containing ID and year. I want to create the dummyvariables "2011" and "2012" that should take on the value 1 if the ID has an observation in the given year and 0 otherwise. Eg. ID 2 should have 2011=1 and 2012=0, since the ID only has an observation for 2011.
ID Year 2011 2012
1 2011 1 1
1 2012 1 1
2 2011 1 0
3 2012 0 1
Can anyone help? Thanks!

For one thing, 2011 or 2012 are not valid names for SAS variables. SAS variables must start with a letter or an underscore (e.g., _2011).
If you really need to, you can get around that limitation by setting the system option validvarname=any and surrounding your 'invalid' variable names with single quotes and appending an n.
This would do what you want:
data have;
infile datalines;
input ID year;
datalines;
1 2011
1 2012
2 2011
3 2012
;
run;
options validvarname=ANY;
proc sql;
create table want as
select ID
,year
,exists(select * from have b where year=2011 and a.id=b.id) as '2011'n
,exists(select * from have b where year=2012 and a.id=b.id) as '2012'n
from have a
;
quit;

Related

How to transpose EG5.1

I have a data set of approximately this format:
Table format :
ID
2012
2013
2014
A
1
3
B
2
4
And I want to transpose it to this format:
Table format :
ID
Source
Value
A
2012
1
A
2013
3
B
2012
2
B
2014
4
Using the Transpose task. I'm working in EG 5.1 and I've got a massive mental block on how to do this. Most of the guides are for doing this the opposite way around. Thanks so much in advance for any advice.
Use proc transpose instead. Create a new SAS program and run the following code:
proc transpose data = have
out = want(rename = (COL1 = Value)
where = (NOT missing(Value) )
)
name = Source;
by id;
var _NUMERIC_;
run;
Output:
ID Source Value
A 2012 1
A 2013 3
B 2012 2
B 2014 4
In Enterprise Guide, this is the Stack Columns task:

SAS sum by group and then create new variable for each group

I want to do summation for each group and create a new variable for the sum for each group. I tried proc sql, but it only created a new variable.
My dataset looks like:
data have;
input firm year product$ value;
datalines;
1 2012 a 5
1 2012 a 6
1 2012 b 3
1 2013 a 4
1 2013 a 3
1 2013 b 4
1 2013 b 3
2 2012 a 5
2 2012 a 6
2 2012 b 3
2 2012 b 4
2 2012 b 2
2 2013 a 4
2 2013 a 5
2 2013 b 3
2 2013 b 3
;
run;
what I want is a table with four columns: firm year productA_sum productB_sum.
I tried this way:
proc sql;
create table h.want as
select a.*, sum(a.value) as sumvalue
from h.have as a
group by firm, year, product;
quit;
But it only create a new column.
because u group three variables, but in the select, you choose all variables. it will cause group by function useless.
/*Try this one*/
proc sql;
create table h.want as
select a.firm, a.year, a.product, sum(a.value) as sumvalue
from h.have as a
group by firm, year, product;
quit;
To get separate SUM() results based on another variable's value you need to use a CASE statement, not include it in the grouping variables.
proc sql;
create table want as
select firm, year
, sum(case when (product='a') then value else . end) as sum_product_A
, sum(case when (product='b') then value else . end) as sum_product_B
from have
group by firm,year
;
quit;
If you want the sum to be zero instead of missing if the product never appears then replace the missing values in the else clauses with 0 instead.
You are pivoting an aggregate sum. A two step approach could be more desirable if there are more than two product values to contend with.
proc summary data=have nway noprint;
class firm year product;
var value;
output out=class_sums sum=sum;
run;
proc transpose data=sums suffix=_sum out=want(drop=_name_);
by firm year;
id product;
var sum;
run;

Defining a new field conditionally using put function with user-defined formats

I am trying to define a new value for an observation with a user defined format. However, my if/then/else statement seems to only work for observations with a year value of "2014". The put statements are not working for other values. In SAS, the put statement is blue in the first statement, and black in the other two. Here is a picture of what I mean:
Does anyone know what I am missing here? Here is my complete code:
data claims_t03_group;
set output.claims_t02_group;
if year = "2014" then test = put(compress(lookup,"_"),$G_14_PROD35.);
else if year = "2015" then test = put(compress(lookup,"_"),$G_15_PROD35.);
else test = put(compress(lookup,"_"),$G_16_PROD35.);
run;
Here is an example of what I mean when I say that the process seems to "work" for 2014:
As you can see, when the Year value is 2014, the format lookup works correctly, and the test field returns the value I am expecting. However, for years 2015 and 2016, the test field returns the lookup value without any formatting.
Your code utilises user-defined formats, $G_14_PROD.-$G_16_PROD.. My guess would be that there is a problem with one or more of these, but unless you can provide the format definitions it will be difficult to assist you further.
Try running the following and sharing the resulting output dataset work.prdfmts:
proc sql noprint;
select cats(libname,'.',memname) into :myfmtlib
from sashelp.vcatalg
where objname = 'G_14_PROD';
quit;
proc format cntlout = prdfmts library=&myfmtlib;
select G_14_PROD G_15_PROD G_16_PROD;
run;
N.B. this assumes that you only have one catalogue containing a format with that name, and that the format definitions for all 3 formats are contained in the same catalogue. If not, you will need to adapt this a bit and run it once for each format to find and export the definition.
Not that it solves your actual problem, but you could eliminate the IF/THEN by using the PUTC() function instead.
data have ;
do year=2014,2015,2016;
do lookup='00_01','00_02' ;
output;
end;
end;
run;
proc format ;
value $G_14_PROD '0001'='2014 - 1' '0002'='2014 - 2' ;
value $G_15_PROD '0001'='2015 - 1' '0002'='2015 - 2' ;
value $G_16_PROD '0001'='2016 - 1' '0002'='2016 - 2' ;
run;
data want ;
set have ;
length test $35 ;
if 2014 <= year <= 2016 then
test = putc(compress(lookup,'_'),cats('$G_',year-2000,'_PROD.'))
;
run;
Result
Obs year lookup test
1 2014 00_01 2014 - 1
2 2014 00_02 2014 - 2
3 2015 00_01 2015 - 1
4 2015 00_02 2015 - 2
5 2016 00_01 2016 - 1
6 2016 00_02 2016 - 2

Establishing treatment sample with Panel data in SAS

I have panel data that looks something like this:
ID year dummy
1234 2007 0
1234 2008 0
1234 2009 0
1234 2010 1
1234 2011 1
2345 2008 0
2345 2009 1
2345 2010 1
2345 2011 1
3456 2008 0
3456 2009 0
3456 2010 1
3456 2011 1
With more observations following the same pattern and many more variables that aren't relevant to this problem.
I want to establish a treatment sample of IDs where the dummy variable "switches" at 2010 (is 0 when year<2010 and 1 when year>=2010). In the example data above, 1234 and 3456 would be in the sample and 2345 would not.
I'm fairly new to SAS and I guess I'm not familiar enough with CLASS and BY statements to figure out how to do this.
So far I've done this:
data c_temp;
set c_data_full;
if year < 2010 and dummy=0
then trtmt_grp=1;
else pre_grp=0;
if year >=2010 and dummy=1
then trtmt_grp=1;
run;
But that doesn't do anything about the panel aspect of the data. I can't figure out how to do the last step of selecting only the IDs where trtmt_grp is 1 for every year.
All help is appreciated! Thanks!
Don't think you need double DoW loop, unless you need to append the data to the other rows. Simple single pass should suffice if you just need a single row per ID that matches.
data want;
set have;
by id;
retain grpcheck; *keep its value for multiple passes;
if first.id and year < 2010 then grpcheck=1; *reset for each ID to 1 (kept);
else if first.id and year ge 2010 then grpcheck=0;
if (year<2010) and (dummy=1) then grpcheck=0; *if a non-zero is found before 2010, set to 0;
if (year >= 2010) and (dummy=0) then grpcheck=0; *if a 0 is found at/after 2010, set to 0;
if last.id and year >= 2010 and grpcheck=1; *if still 1 by last.id and it hits at least 2010 then output;
run;
Any time you want to do some logic for each ID (or, each logically grouped set of rows by some variable's value), you start by setting your flag/etc. in an if first.id statement group. Then, modify your flag as appropriate for each row. Then, add an if last.id group which checks to see if the flag is still set when you've hit the last row.
I think you probably want a double DOW loop. First loop to calculate your TRTMT_GRP flag at the ID level and the second to select the detailed records.
data want ;
do until (last.id);
set c_data_full;
by id dummy ;
if first.dummy and dummy=1 and year=2010 then trtmt_grp=1;
end;
do until (last.id);
set c_data_full;
by id ;
if trtmt_grp=1 then output;
end;
run;
It seems to me that Proc SQL can deliver a pretty straightforward approach,
proc sql;
select distinct id from have
group by id
having sum(year<=2009 and dummy = 1)=0 and sum(year>=2010 and dummy=0) = 0
;
quit;

sas coding: choosing max variable

I have two tables and need to create one more table working with other two:
first_table: SECOND TABLE
id term id term majr_code
3 2014 3 2010 ACT
3 2015 3 2010 ACT
4 2014 3 2011 GNST
4 2015 3 2015 BUSA
5 2013 3 2015 BUSA
5 2014 4 2009 TIM
6 2013 4 2010 BAL
6 2014 4 2014 TAR
5 2011 SAR
5 2013 COR
6 2010 PAT
6 2013 TOR
This is two tables I have. I need to create another table which is same with first table and adding one more column majr_code.
first_table:
id term majr_code
3 2014 GNST
3 2015 BUSA
4 2014 TAR
4 2015 TAR
5 2013 COR
5 2014 COR
6 2013 TOR
6 2014 TOR
what I need to do is, for the same id if second table has the same term with first table, I will keep same majr_code. For example: For first table has 2014 and second table has 2011 and 2015, I need to use 2011's majr_Code for 2014 term. For example: first table has 2013 and 2014 terms for the same id, and if second table's highest term is 2013, I will keep same majr_Code for 2013 and 2014
I know its complicated, it should be more clear if you check the tables and result. If still complicated, I can delete the question. This is how I can explain. Thanks!
I think the below code should do the trick. It works as follows:
1) reads in the sample datasets.
2) Create a table titled second_table_nogaps which is just the second_table but with no yearly gaps up through 2015. Basically, for each ID in the second table, it checks if a given yearly record exists. If so, the record is output, if not, it creates a new record with the prior year's majr_code. If the last record for a given id is not 2015, then new records are generated up through 2015. (for example a new record is created for id=4, year=2014, majr_code = TAR)
3) Merged the unique values of id+term+majr_code to first_table. The resulting table First_table_2 should be what you're looking for! However, BE CAREFUL, if there are multiple majr_codes for the same id+term this step will result in duplication.
Hope this helps! The code in step 2 could probably be simplified as my handling of the first and last record was not particularly efficient.
data first_table;
infile datalines ;
input id term;
datalines ;
3 2014
3 2015
4 2014
4 2015
5 2013
5 2014
6 2013
6 2014
;
run;
data second_table;
infile datalines ;
input id term majr_code $;
datalines ;
3 2010 ACT
3 2010 ACT
3 2011 GNST
3 2015 BUSA
3 2015 BUSA
4 2009 TIM
4 2010 BAL
4 2014 TAR
5 2011 SAR
5 2013 COR
6 2010 PAT
6 2013 TOR
;
run;
proc sort data=second_table ; by id term; run;
data second_table_nogaps (keep=id_nogaps term_nogaps majr_code_nogaps );
set second_table end=eof;
retain id_nogaps term_nogaps majr_code_nogaps ;
*first set up the first row... establishes retained variables and outputs;
if _N_ = 1 then do;
id_nogaps = id ;
term_nogaps = term;
majr_code_nogaps = majr_code;
output;
end;
*for all but the first and last row;
else if not eof then do;
do while ( (term_nogaps + 1 < term ) /*this is to fill in gaps between years. (e.g. major code in 2011 and major code in 2014 within the same id*/
or
((id_nogaps ne id) and term_nogaps < 2015) /*this is to fill major code for all terms up through 2015 (e.g. last major code for id 4 is in 2014)*/
);
term_nogaps = term_nogaps + 1;
output;
end;
id_nogaps=id;
term_nogaps = term;
majr_code_nogaps=majr_code;
output;
end;
else do;
do while (term_nogaps + 1 < term );
term_nogaps = term_nogaps + 1;
output;
end;
id_nogaps=id;
term_nogaps = term;
majr_code_nogaps=majr_code;
output;
do while ( term_nogaps < 2015 );
term_nogaps = term_nogaps + 1;
output;
end;
end;
run;
proc sql;
create table First_table_2 as
Select a.* , b.majr_code_nogaps as majr_code
from first_table a
left join
(select distinct id_nogaps, term_nogaps, majr_code_nogaps from second_table_nogaps) b /*select distinct values to prevent duplication*/
on a.id = b.id_nogaps and a.term = b.term_nogaps;
quit;
There are a few approaches to this, but sql is probably easiest. You don't provide code, so i'll just include a pointer. You need to use having to filter the table after it's been grouped to having term=max(term).