Data manipulation using proq sql in sas - sas

I have the below table
enter image description here
I calculate the std deviation of y and z till from 2001 q4 to 2003 q2 .
I have to create a new table which should look like below
enter image description here
I tried using the case statement inside proc sql, but it did not work. Any assistance will be appreciated.
proc sql;
create table tablenew as
select
Date,
X,
case when Date >= "2003Q3" then (y+std(y)) else y end as y,
case when Date >= "2003Q3" then (z-std(z)) else z end as z
from have
;
quit;
But here, the std deviation of the entire column is calculated, i only want the standard deviation of columns y and z , from 2001 q3 to 2003 q2

Only using SQL is just going to make this problem more difficult than it needs to be, but the issue is you need to wrap the value you send to the STD() function in a CASE statement also. Here I have saved those trimmed values as their own variables so you can see what is happening.
create table tablenew as
select
Date
,X
,case when date >= '2003Q3' then y else . end as y_subset
,case when date >= '2003Q3' then z else . end as z_subset
,case when Date >= "2003Q3" then (y+std(calculated y_subset)) else y end as y
,case when Date >= "2003Q3" then (z-std(calculated z_subset)) else z end as z
from have
;
Much easier to break it down into logical steps and just use normal SAS code instead of SQL.
proc summary data=have ;
where date >= '2003Q3' ;
var y z ;
output out=std std=y_std z_std ;
run;
data want ;
set have ;
if _n_=1 then set std ;
if date >= '2003Q3' then do;
y=y+y_std;
z=z-z_std;
end;
run;

Related

Finding the max value of a variable in SAS per ID per time period

proc sql;
create table abc as select distinct formatted_date ,Contract, late_days
from merged_dpd_raw_2602
group by 1,2
;quit;
this gives me the 3 variables I\m working with
they have the form
|ID|Date in YYMMs.10| number|
proc sql;
create table max_dpd_per_contract as select distinct contract, max(late_days) as DPD_for_contract
from sasa
group by 1
;quit;
this gives me the maximum number for the entire period but how do I go on to make it per period?
I'm guessing the timeseries procedure should be used here.
proc timeseries data=sasa
out=sasa2;
by contract;
id formatted_date interval=day ACCUMULATE=maximum ;
trend maximum ;
var late_days;
run;
but I am unsure how to continue.
I want to to find the maximum value of the variable "late days" per a given time period(month). So for contact A for the time period jan2018 the max late_days value is X.
how the data looks:https://imgur.com/iIufDAx
In SQL you will want to calculate your aggregate within a group that uses a computed month value.
Example:
data have;
call streaminit(2021);
length contract date days_late 8;
do contract = 1 to 10;
days_late = 0;
do date = '01jan2020'd to '31dec2020'd;
if days_late then
if rand('uniform') < .55 then
days_late + 1;
else
days_late = 0;
else
days_late + rand('uniform') < 0.25;
output;
end;
end;
format date date9.;
run;
options fmterr;
proc sql;
create table want as
select
contract
, intnx('month', date, 0) as month format = monyy7.
, max(days_late) as max_days_late
from
have
group by
contract, month
;
You will get the same results using Proc MEANS
proc means nway data=have noprint;
class contract date;
format date monyy7.;
output out=want_2 max(days_late) = max_days_late;
run;

Populate SAS macro-variable using a SQL statement within another SQL statement?

I stumbled upon the following code snippet in which the variable top3 has to be filled from a table have rather than from an array of numbers.
%let top3 = 14 15 42; /* This should be made obsolete.. */
%let no = 3;
proc sql;
create table want as
select *
from (select x, y from foo) a
%do i = 1 %to &no.;
%let current = %scan(&top3.,&i.); /* What do I need to put here? */
left join (select x, y from bar where z=&current.) row_&current.
on a.x = row_&current..x
%end;
;
quit;
The table have contains the xs from the string and looks as follows:
i x
1 14
2 15
3 42
I am now wondering how I should modify the %let current = ... line such that current is populated from the table have. I know how to populate a macro variable using proc sql with select .. into, but I am afraid that the way I am going right now is fully against SAS philosophy.
It looks like you're more or less transposing something. If that's the case, this is doable in macro/sql pretty easily.
First, here's the simple version - no macro.
proc sql;
create table class_t as
select * from (
select name from sashelp.class ) class
left join (
select name, age as age_Alfred
from sashelp.class
where name='Alfred') Alfred
on class.name = Alfred.name
;
quit;
We grab the value of age from the Alfred row and put it on the main join. This isn't exactly what you're doing, but it seems similar. (I'm just using one table, but you can of course use two here.)
Now, how do we extend this to be table-driven and not handwritten? Macros!
First, here's the macro - just taking the Alfred bit and making it generic.
%macro joiner(name=);
left join (
select name, age as age_&name.
from sashelp.class
where name="&name.") &name.
on class.name = &name..name
%mend joiner;
Second, we look at this and see two things we need to put into macro lists: the SELECT variable list (we'll get one new variable for each call), and the JOIN list.
proc sql;
select cats('%joiner(name=',name,')')
into :joinlist separated by ' '
from sashelp.class;
select cats(name,'.age_',name)
into :selectlist separated by ','
from sashelp.class;
quit;
And then, we just call it!
proc sql;
create table class_t as
select class.name,&selectlist. from (
select name from sashelp.class) class
&joinlist.
;
quit;
Now, your dataset you call the macro lists from is perhaps the dataset with the 3 rows in it you have above ("have"). The dataset you actually get the appending data from is some other dataset ("bar"), right? And then the ones you join to is perhaps a third dataset ("foo"). Here I just use the one, for simplicity, but the concept is the same, just different sources.
When the lookup data is in a table you can perform a three way join without any need for SAS Macro. You don't provide any data so the example will mock some.
Example:
Suppose a master record has several associated detail records, and the detail records contain a z value used for selection into a result set per a wanted z lookup table.
data masters;
call streaminit(2020);
do id = 1 to 100;
do x = 1 to 100;
m_rownum + 1;
code = rand('integer', 10,45);
output;
end;
end;
run;
data details;
call streaminit(2020);
do date = 1 to 20;
do x = 1 to 100;
do rep = 1 to 5;
d_rownum + 1;
amount = rand('integer', 100,200);
z = rand('integer', 10,45);
output;
end;
end;
end;
run;
data zs;
input z ##; datalines;
14 15 42
;
proc sql;
create table want as
select
m_rownum
, d_rownum
, masters.id
, masters.x
, masters.code
, details.z
, details.date
, details.amount
from
masters
left join
details
on
details.x = masters.x
inner join
zs
on
zs.z = details.z
order by
masters.id, masters.x, details.z, details.date
;
quit;

Sort all rows by length of string in variable X (longer strings first)

I have a variable UserName that contains IDs of variable length. A shortened example:
How can I sort all rows by variable X where longer strings are listed first.
Context: This is for calculating HEI 2015 scores using the ASA24 macro. It writes:
/*Note: Some users have found that the SAS program will drop observations from the analysis if the ID field is not the same length for all observations. To prevent this error, the observations with the longest ID length should be listed first when the data is imported into SAS. */
Proc SQL with an ORDER BY clause specifying an ordering value computed in a CASE expression.
The computation when length(X) > 8 then -length(X) else 0 ensures longest values are first when sorted and all value lengths <= some-capping-length (8) are treated equally
ORDER BY length(X) desc, X would also select longest X values first and then by X itself, but length would predominate ordering even when value lengths < 8.
data have;
length X $50;
input X; datalines;
GFHsp036
GFHsp038
GFHsp039
GFHsp040
GFHsp0400
GFHsp0401
GFHsp0402
GFHsp04021
;
proc sql;
create table want as
select * from have
order by
case when length(x) > 8 then -length(X) else 0 end,
X
;
quit;
proc print;
var X / style=[fontfamily='Courier'];
run;
Here is probably the simplest way to do this
data have;
input string $;
datalines;
abcde
ab
a
abcd
abc
;
proc sql;
create table want as
select * from have
order by length(string) desc;
quit;
Re-ordering IDs did not help in my case as PROC IMPORT needed GUESSINGROWS = MAX.
Please see SAS Macro Truncating IDs
For how to fix the truncating IDs that this question attempted to fix.

Bar chart with 2 variable on x axis and 1 in Y axis

I want to create a bar chart on yearly death count (based on gender). I want to plot gender and year on x axis and count on Y axis. Can you kindly help how to modify the below code?
TITLE 'DEATH GRAPH BY GENDER';
PROC SGPLOT DATA = DREPORT;
VBAR deathcount / GROUP = gender GROUPDISPLAY = CLUSTER;
RUN;
I am not able to put deathyear in the Y axis. Kindly frame the code.
The VBAR variable is the mid-point values to show on the horizontal axis.
Are you sure that is what you want ?
Do you really want to know how many times a give death count occurred over all the years ?
You probably want deathcount as the response
Consider this example:
data have_raw;
do id = 1 to 1000;
gender = substr('MF',1 + 2 * ranuni(123),1);
year = 2019 - floor (30 * ranuni(123));
output;
end;
run;
proc sql;
create table have as
select year, gender, count(*) as deathcount
from have_raw
group by year, gender
;
proc sgplot data=have;
vbar gender
/ response=deathcount
group=year
groupdisplay=cluster
;
run;

refering to a transposed column name which is referenced by a macro variable

referring to below code, after I transpose a data-set (output qc2), I tried to create a percentage column (most_recent_wk_percent_change) but the result of the column is 12.5% with two new columns - &week3. and &week2. created. The expected result is to calculate based on the values in week2 and week3 columns. I know the problem could be the referencing of the two columns in the percentage calculation (==> ( &week3. - &week2.)/&week2.;) , but I couldn't put my head to the correction. pls advise :)
%let week1 = 7;
%let week2 = 8;
%let week3 = 9;
proc sql;
create table qc as
select t_week, prod_cat, sum(sales) as sales
from master_table
where t_week in (&week1.,&week2.,&week3.)
group by 1,2
order by 2;
quit;
proc transpose data= qc out=qc2;
format
by prod_cat ;
id t_week;
run;
data qc2;
set qc2;
format most_recent_wk_percent_change PERCENT7.1;
most_recent_wk_percent_change = ( &week3. - &week2.)/&week2.;
run;
qc:
t_week|prod_cat|sales
7|cat|100
8|cat|200
9|cat|300
7|dog|150
8|dog|400
9|dog|300
7|rat|200
8|rat|600
9|rat|300
qc2: (TRANSPOSED TABLE --> note the column name of 7,8,9. (which is expected)
prod_cat|7|8|9
cat|100|200|300
dog|150|400|300
rat|200|600|300
qc2: (i wanted to get the change in % )
prod_cat|7|8|9|most_recent_wk_percent_change|&week2.|&week3.
cat|100|200|300|12.5%|.|.| ==> 12.5% is wrong. should be 50% (300-200)/(200)
dog|150|400|300|12.5%|.|.| ==> 12.5% is wrong. should be -25%
rat|200|600|300|12.5%|.|.| ==> 12.5% is wrong. should be -50%
I have no idea what you are doing or why, but if you have set VALIDVARNAME=any and the actual name of your variable is 7 and you try to use it in SAS code like this:
ratio = 7/8 ;
Then SAS will assume you mean the numeric value 7.
You need to use a name literal instead.
ratio = '7'n / '8'n ;
So you want
most_recent_wk_percent_change = ("&week3"n-"&week2"n)/"&week2"n;
If instead the actual name of the variable is _7 then you need to code this way.
most_recent_wk_percent_change = (_&week3.-_&week2.)/_&week2.;
Try adding a keep statement to your last data step, this will only keeps the columns you want in the output.
data qc2 (keep= most_recent_wk_percent_change prod_cat);
set qc2;
format most_recent_wk_percent_change PERCENT7.1;
most_recent_wk_percent_change = ( &week3. - &tweek2.)/&week2.;
run;