Panel regression in SAS using subgroups in data set - sas

Below is a sample of my dataset:
Within in the variable "Country" I have countries belonging to Group A, and Group B (dummy variables).
I want to do a panel regression in SAS on the returns of these countries as such:
model Returns = Event(0,1)
with the added condition that, for example,
I only want to consider countries belonging to Group A, and during a Pre-2000 period.
Is there a way to code that in SAS using this current dataset?

SAS/ETS provides the proc panel procedure that will model panel data. Note that you must have identical time periods for each cross-section. If you don't, you'll need to prepare the data with proc timeseries or proc expand beforehand.
Once you read your data in, you'll use proc panel with a where statement to construct the model. The ID statement is a bit different in proc panel. It first expects the cross-section variable, then the time ID variable.
proc panel data=have;
where GroupA = 1
AND year(date) < 2000;
id country date;
class event;
model Returns = Event;
run;

Related

How to select a single value from a table, to use for comparison(greater than/less than)?

I am handing over some code to a colleague, which is to be run daily to generate reports.
Once every month a new cycle starts, and we have to update the code for cycle_start_date
data mtd_table;
set ytd_table;
where entry_date> '10Mar2021'd; /*different every month*/
run;
Since he'll be running them from now on, along with other reports from other teams, I don't want to bother him every month to tweak the code. So I devised this:
i run(once a month)
data shared1.cycle_start_date;
cycle_start_date='10Mar2021'd;
run;
he runs(everyday)
data mtd_table;
set ytd_table;
where entry_date>/*(select cycle_start_date from shared1.cycle_start_date)*/;
run;
I'm not sure how to correctly implement this (select cycle_start_date from shared1.cycle_start_date) part, since it is from proc sql. Would appreciate help.
When you store program parameters in a data set (called control data) one use case is having later code extract the values into macro variables, at which point other code can resolve the macro variable for replacement at (automatic) step compile and run time. Two ways to extract values into macro variables are:
Proc SQL, SELECT ... INTO :<macro-variable>, and
DATA _NULL_, CALL SYMPUT(<macro-variable>, <data step expression>);
Don't forget, macro resolution replaces the macro variable as source code text. Dates in macro variables can be either the SAS data value (the text representation of a SAS date integer) or part of a date literal (the text <dd-mon-yyyy>) that would be resolved as source date literal "&<macro-variable>"D when to be utilized as a date value. The date literal part is used when you want to show the date value as human readable in when output; for example: TITLE "cycle start: &cycle_start_date";
Control data (you)
Rebuild or edit values in data set (name it parameters to be more useful)
data shared1.parameters;
cycle_start_date = '10Mar2021'd; * stored as a SAS date value (integer);
run;
Note: Some control data layouts use a name/value organization and has one row per parameter.
Other
Extract date value as SAS date value text, and as date literal text portion and use.
proc sql noprint;
select
cycle_start_date
, cycle_start_date format=date11.
into
:cycle_start_date_value trimmed
, :cycle_start_date_literal trimmed
from
shared1.parameters
;
%put &=cycle_start_date_value;
%put &=cycle_start_date_literal;
/*
* will log the macro variable value as follows:
* CYCLE_START_DATE_VALUE=22349 and
* CYCLE_START_DATE_LITERAL=10-MAR2021
*/
data ...
set ...;
where date >= &cycle_start_date; *resolve parameter as text representation of a SAS date value (integer);
...
title "Cycle starts: &cycle_start_date_literal";
proc print data=...; * title in output shows human readable part of date;
run;
Another approach is to use a common source code file that is %included by others. You would edit or recreate the parameters file by whatever process you want.
parameters.sas
%let cycle_start_date = 10-Mar-2021;
use
%include 'parameters.sas';
data ...
set ...;
where date >= "&cycle_start_date"D; *resolve parameter as part of date literal;
...
title "Cycle starts: &cycle_start_date";
proc print data=...; * title in output shows human readable part of date literal;
run;
One possible solution would be to put the date from the cycle_start_date table that is in the shared library shared1 into a macro-variable date that will be used in your data step to filter the ytd_table table based on the entry_date variable.
proc sql noprint;
select cycle_start_date into :date
from shared1.cycle_start_date;
quit;
data mtd_table;
set ytd_table;
where entry_date > &date.;
run;

I am trying to figure out how to sort this Dataset. SAS Beginner

How do i start this??
I have two data sets.
For the output you will deliver:
It should be an excel or XML format
Each query logic/programmed check should be on each tab
Columns should be
Subject #,
Visit Date (You will need the Visit Date Listing also attached)
Visit Name (Visit date from the file_34422 must match Visit name in the Blood Pressure File)
Date of Assessment (From the BP Log), VSBPDT_RAW, VSTPT, BP results.
A column for SYBP1. SYBP2, SYBP3, DIABP1, DIABP2, DIABP3
Findings/query text.
Below are Specification for BP:
For same SUBJECT and same FOLDERNAME, where VSTPT is Blood Pressure 1.
if VSBPYN is No, then all must be null or =0 (VSBPDT_RAW, VSBPTM1, SYSBP1, DIABP1, VSBPND2, VSBPTM2, SYSBP2, DIABP2, VSBPND3, VSBPTM3, SYSBP3, DIABP3)
This is what i have started with and
proc sql;
select
f.subject,
f.SVSTDT_RAW, f.FolderName,
b.FolderName,
VSBPDT_RAW, VSTPT,
SYSBP1, SYSBP2, SYSBP3,
DIABP1, DIABP2, DIABP3
FROM first_data as f, bp_data as b
group by subject, foldername
where f.subject = b.subject
having VSTPT is Blood Pressure set 1,
VSBPYN is No;
quit;
I just need to be pointed towards the right direction. I know this can't be right.
I do not know the exact structure of your data, so the solution below may need to be modified by you to select the right columns.
From the descritpion, this looks like it might be a good situation for SQL and a data step. You have a lot of columns to merge with the bp table. It will be easy to do merge all of these columns with first_data in SQL.
When you have lots of by-row conditionals, a data step will be easier to work with and read than many CASE statements in SQL. We'll do a two-stage approach in which we use SQL and a data step.
Step 1: Merge the data
proc sql noprint;
create table stage as
select t1.*
, t2.VSBPYN
from bp_data as t1
INNER JOIN
first_data as t2
ON t1.subject = t2.subject
AND foldername = t2.foldername
where t1.VSTPT = 1
;
quit;
Step 2: Conditionally set values to missing
Next, we'll do a data step for our conditional logic. call missing() is a useful function that will let you set the value of many variables to missing all in a single statement.
data want;
set stage;
if(upcase(VSBPYN) = 'NO') then call missing(VSBPDT_RAW, VSBPTM1, SYSBP1, DIABP1,
VSBPND2, VSBPTM2, SYSBP2, DIABP2,
VSBPND3, VSBPTM3, SYSBP3, DIABP3
);
run;
Step 3: Output to Excel
Finally, we sent the output to Excel.
proc export
data=want
file='/my/location/want.xlsx'
dbms=xlsx
replace;
run;

Proc tabulate grouping Data - Three variables

I have three variable CONFIG, YEAR, TOT_SAL, i need all config in rows, years in columns and
based on values in rows and columns i need sum of third variable TOT_SAL;
I am so far trying this;
PROC TABULATE data=final OUT=work.final;
CLASS CONFIG YEAR;
TABLES CONFIG,YEAR;
Var TOT_SAL;
RUN;
This gives me cross tab for config and year but instead of frequency of config
i need SUM(TOT_SAL) in the cross tab.
Here's an example of how to do that. Since you didn't provide data I used the SASHELP.SHOES data set so this example can be replicated. If you need further assistance ensure to post actual sample data.
proc tabulate data=sashelp.shoes;
class region product;
var sales;
table region, product*(sales='')*(sum=''*f=dollar32.);
run;
The first and second examples in the SAS documentation shows another method as well as explaining each step in detail.
The simplest answer is adding the VAR statement. Note that you have tot_sal in the CLASS statement. That is incorrect, because the CLASS statement is intended for categorical/grouping variables, not variables to be summarized. Those go in the VAR statement instead.
PROC TABULATE data=final OUT=work.final;
CLASS CONFIG YEAR;
VAR TOT_SAL;
TABLES CONFIG, YEAR*TOTAL_SAL*(sum=''*f=dollar32.) ;
RUN;

Iteratively adding to merged SAS dataset

I have 18 separate datasets that contain similar information: patient ID, number of 30-day equivalents, and total day supply of those 30-day equivalents. I've output these from a dataset that contains those 3 variables plus the medication class (VA_CLASS) and the quarter it was captured in (a total of 6 quarters).
Here's how I've created the 18 separate datasets from the snip of the dataset shown above:
%macro rx(class,num);
proc sql;
create table dm_sum&clas._qtr&num as select PatID,
sum(equiv_30) as equiv_30_&class._&num
from dm_qtrs
where va_class = "HS&class" and dm_qtr = &qtr
group by 1;
quit;
%mend;
%rx(500,1);
%rx(500,2);
%rx(500,3);
%rx(500,4);
%rx(500,5);
%rx(500,6);
%rx(501,1);
and so on...
I then need to merge all 18 datasets back together by PatID and what I'd like to do is iteratively add the next dataset created to the previous, as in, add dataset dm_sum_500_qtr3 to a file that already contains the results of dm_sum_500_qtr1 & dm_sum_500_qtr1.
Thanks for looking, Brian
In the macro append the created data set to it an accumulator data set. Be sure to delete it before starting so there is a fresh accumulation. If the process is run at different times (like weekly or monthly) you may want to incorporate a unique index to prevent repeated appendings. If you are stacking all these sums, the create table should also select va_class and dm_qtr
%macro (class, num, stack=perm.allClassNumSums);
proc sql; create table dm_sum&clas._qtr&num as … ;
proc append force base=perm.allClassNumSums data=dm_sum&clas._qtr&num;
run;
%mend;
proc sql;
drop table perm.allClassNumSums;
%rx(500,1)
%rx(500,2)
%rx(500,3)
%rx(500,4)
%rx(500,5)
…
A better approach might be a single query with an larger where, and leave the class and qtr as categorical variables. Your current approach is moving data (class and qtr) into metadata (column names). Such a transformation makes additional downstream processing more difficult.
Proc TABULATE or REPORT can be use a CLASS statement to assist the creation of output having category based columns. These procedures might even be able to work directly with the original data set and not require a preparatory SQL query.
proc sql;
create table want as
select
PatID, va_class, dm_qtr,
sum(equiv_30) as equiv_30_sum
from dm_qtrs
where catx(':', va_class, dm_sqt) in
(
'HS500:1'
'HS500:2'
'HS500:3'
…
'HS501:1'
)
group by PatID, va_class, dm_qtr;
quit;

How to calculate regression coefficient and put it into each row of a table

I have a SQL that would create for each customer a short excerpt of his history. Suppose the columns I am interested in are TIMESTAMP and PURCHASE VALUE. I'd like to calculate a linear regression for each customer and put this value into a table.
proc sql;
create table CUSTOMERHISTORY as
select
TIME_STAMP
,PURCHASE_VALUE
,CUSTOMER_ID
from <my data source>
;quit;
The table is quite large; it would be best, if the table wouldn't have to loaded into RAM prior to computation.
I tried
proc reg
data = CUSTOMERHISTORY;
model PURCHASE_VALUE=TIME_STAMP;
outest = OUTTABLE;
by CUSTOMER_ID;
but it never wrote anything to the OUTTABLE. (I found parameter outest in http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_reg_sect007.htm )
According to the documentation you link to, outtest is a parameter that you should give as a option to proc reg. So to get that specific output, your code should look as:
proc reg
data = CUSTOMERHISTORY
outest = OUTTABLE;
model PURCHASE_VALUE=TIME_STAMP;
by CUSTOMER_ID;
run;
Note that there is no semicolon between data = ... and outtest = ....