PROC REPORT : Is it possible to don't print a specific row? - sas

I am writing a report to Excel with PROC REPORT. the first column is grouped, and I add a break line before some values of it. This break line contains the value of the column if it match some conditions.
Eg.
My table contains this rows :
nom_var | val1 | val2 | val3 |
_____________________________________________________
Identification | . | . | . |
Name | Ou. Dj. | . | . |
date B. | 00/01/31 | . | . |
NAS | 1122334 | . | . |
Revenues | . | . | . |
| R1 1250 $ | R2 1000 $ | . |
_____________________________________________________
In the report I have :
_____________________________________________________
Identification
_____________________________________________________
Identification | . | . | . |
Name | Ou. Dj. | . | . |
date B. | 00/01/31 | . | . |
NAS | 1122334 | . | . |
____________________________________________________
Revenues
_____________________________________________________
Revenues | . | . | . |
| R1 1250 $ | R2 1000 $ | . |
_____________________________________________________
Please, how can I revove the lines containing "Identification" and "Revenues" in the first column "nom_var"?
I mean :
Identification | . | . | . |
and
Revenues | . | . | . |
Here is my code :
ods listing close;
*options générales;
options topmargin=1in bottommargin=1in
leftmargin=0.25in rightmargin=0.25in
;
%let fi=%sysfunc(cat(%sysfunc(compress(&nom)),_portrait_new.xls));
ods tagsets.ExcelXP path="&cheminEx." file="&fi" style=seaside
options(autofit_height="yes"
pagebreaks="yes"
orientation="portrait"
papersize="letter"
sheet_interval="none"
sheet_name="Infos Contribuable"
WIDTH_POINTS = "12" WIDTH_FUDGE = ".0625" /* absolute_column_width est en pixels*/
absolute_column_width="120,180,160,150"
);
ods escapechar="^";
*rapport1;
/*contribuable*/
proc report data=&lib..portrait nowindows missing spanrows noheader
style(report)=[frame=box rules=all
foreground=black Font_face='Times New Roman' font_size=10pt
background=none]
style(column)=[Font_face='Times New Roman' font_size=10pt just=left]
;
/*entête du tableau est la première variable de la table ==> à gauche du rapport */
define nom_var / group order=data style(column)=[verticalalign=middle
background=#e0e0e0 /* gris */
foreground=blue
fontweight=bold
];
/* Contenu */
define valeur_var1 / style(column)=[verticalalign=top];
define valeur_var2 / style(column)=[verticalalign=top];
define valeur_var3 / style(column)=[verticalalign=top];
compute before nom_var / style=[verticalalign=middle background=#e0e0e0
foreground=blue fontweight=bold font_size=12pt];
length rg $ 50;
if nom_var in ("Identification","Actifs", "Revenus") then do;
rg= nom_var;
len=50;
end;
else do;
rg="";
len=0;
end;
line rg $varying50. len;
endcomp ;
title j=center height=12pt 'Portrait du contribuable';
run;
ods tagsets.ExcelXP close;
ods listing;

You have a artificial data construct that is not in a categorical form appropriate to the task of outputting your informative line.
This sample shows how a DATA Step can tweak the data so you have a mySection variable that organizes the rows introduced by the nom_var row of interest (Identification and Revenues)
The new arrangement of data is more suited for the task you are undertaking.
data have;
length nom_var val1 val2 val3 $50;
infile cards dlm='|';
input
nom_var val1 val2 val3 ;
datalines;
Identification | . | . | . |
Name | Ou. Dj. | . | . |
date B. | 00/01/31 | . | . |
NAS | 1122334 | . | . |
Revenues | . | . | . |
| R1 1250 $ | R2 1000 $ | . |
run;
Tweak original data so there is a categorical mySection
data need;
set have;
retain mySection;
select (nom_var);
when ('Identification') mySection = nom_var;
when ('Revenues') mySection = nom_var;
otherwise OUTPUT; * NOTE: Explicit OUTPUT means there is no implicit OUTPUT, which means the rows that do mySection= are not output;
end;
run;
Use the new variable (mySection) for grouping (compute before), but keep it's column hidden (noprint)
proc report data=need;
column mySection nom_var val1 val2 val3;
define mySection / group noprint;
compute before mySection;
line mySection $50.;
endcomp;
run;

Related

Construct continuous intervals from non-contiguous validity ranges

Given
data have;
infile datalines missover delimiter="|" dsd;
input id :$20. (start_date end_date) (:date9.) (attribute_1 attribute_2 attribute_3 attribute_4) ($);
format start_date end_date date9.;
datalines;
ID1|01MAR2014|31DEC9999|BIG|YES||
ID2|01SEP2015|30NOV2020|||TWO|
ID2|01SEP2015|31DEC9999|SMALL|||
ID2|01AUG2021|31DEC9999|||TWO|
ID3|01DEC2014|31MAY2016||YES||
ID3|01DEC2014|29JUN2017||||OK
ID3|01DEC2014|31DEC9999|MEDIUM|||
ID3|31MAR2015|29SEP2017|||ONE|
ID3|30JUN2017|31DEC9999||YES||TBD
ID3|30SEP2017|31DEC9999|||ONE, TWO|
;
I would like to get continuous validity ranges for each id with the correct attributes at each of them.
The desired output would be like this:
+-----+------------+-----------+-------------+-------------+-------------+-------------+
| id | start_date | end_date | attribute_1 | attribute_2 | attribute_3 | attribute_4 |
+-----+------------+-----------+-------------+-------------+-------------+-------------+
| ID1 | 01MAR2014 | 31DEC9999 | BIG | YES | | |
| ID2 | 01SEP2015 | 30NOV2020 | SMALL | | TWO | |
| ID2 | 01DEC2020 | 31JUL2021 | SMALL | | | |
| ID2 | 01AUG2021 | 31DEC9999 | SMALL | | TWO | |
| ID3 | 01DEC2014 | 30MAR2015 | MEDIUM | YES | | OK |
| ID3 | 31MAR2015 | 31MAY2016 | MEDIUM | YES | ONE | OK |
| ID3 | 01JUN2016 | 29JUN2016 | MEDIUM | | ONE | OK |
| ID3 | 30JUN2016 | 29SEP2017 | MEDIUM | YES | ONE | TBD |
| ID3 | 30SEP2017 | 31DEC9999 | MEDIUM | YES | ONE, TWO | TBD |
+-----+------------+-----------+-------------+-------------+-------------+-------------+
I found a way of doing it using joins, but would like to know if there exist a better way of doing the following:
data all_intervals;
set have(keep= id start_date end_date);
_start = start_date; output;
_end = start_date-1; output;
_end = end_date; output;
if end_date < '31DEC9999'd then do;
_start = end_date+1; output;
end;
run;
proc sql;
create table all_intervals as
select distinct t1.id, t1._start, t2._end
from all_intervals t1, all_intervals t2
where t1.id = t2.id and t2._end > t1._start
;
quit;
data all_intervals;
set all_intervals;
by id _start;
if first.id or first._start;
run;
proc sql noprint;
select 'max(t1.'||NAME||') as '||NAME into :attributes separated by ','
from sashelp.vcolumn
where libname = "WORK" and memname = "HAVE" and upcase(name) not in ('ID', "_START", "_END")
;
quit;
proc sql;
create table merge as
select t2.id, t2._start as start_date, t2._end as end_date, &attributes.
from have t1 right join all_intervals t2
on t1.id = t2.id
and ((t2._start <= t1.start_date <= t2._end)
or (t2._start <= t1.end_date <= t2._end)
or (t2._start >= t1.start_date and t2._end <= t1.end_date))
group by t2.id, t2._start
order by t2.id, t2._start
;
quit;
proc sort data=merge out=want nodupkey; by id start_date end_date; run;
The above produce the expected output.

Grouping child items and displaying parent sum

I have the following table
+-------+--------+---------+
| group | item | value |
+-------+--------+---------+
| 1 | a | 10 |
| 1 | b | 20 |
| 2 | b | 30 |
| 2 | c | 40 |
+-------+--------+---------+
I would like to group the table by group, insert the grouped sum into value, and then ungroup:
+-------+--------+
| item | value |
+-------+--------+
| 1 | 30 |
| a | 10 |
| b | 20 |
| 2 | 70 |
| b | 30 |
| c | 40 |
+-------+--------+
The purpose of the result is to interpret the first column as items a and b belonging to group 1 with sum 30 and items b and c belonging to group 2 with sum 70.
Such a data transformation can be indicative of a reporting requirement more than a useful data structure for downstream processing. Proc REPORT can create output in the form desired.
data have;
infile datalines;
input group $ item $ value ##; datalines;
1 a 10 1 b 20 2 b 30 2 c 40
;
proc report data=have;
column group item value;
define group / order order=data noprint;
break before group / summarize;
compute item;
if missing(item) then item=group;
endcomp;
run;
I assume that both group and item are character variables
data have;
infile datalines firstobs=4 dlm='|';
input group $ item $ value;
datalines;
+-------+--------+---------+
| group | item | value |
+-------+--------+---------+
| 1 | a | 10 |
| 1 | b | 20 |
| 2 | b | 30 |
| 2 | c | 40 |
+-------+--------+---------+
;
data want (keep=group value);
do _N_=1 by 1 until (last.group);
set have;
by group;
v + value;
end;
value = v;output;v=0;
do _N_=1 to _N_;
set have;
group = item;
output;
end;
run;

Identifying overlap medication use

Asked on SAS communitiesas well , havent gotten a correct response.
https://communities.sas.com/t5/SAS-Programming/Identifying-overlap-medication-use/m-p/628115#M185541
I have a problem similar to the problem in -
https://communities.sas.com/t5/SAS-Programming/Concomitant-drug-medication-use/m-p/339879#M77587
However I have an issue , I have overlapping of same drug as well -
Eg:
+----+------+-----------+-----------+-----------+
| ID | DRUG | START_DT | DAYS_SUPP | END_DT |
+----+------+-----------+-----------+-----------+
| 1 | A | 2/17/2010 | 30 | 3/19/2010 |
| 1 | A | 3/17/2010 | 30 | 4/16/2010 |
| 1 | A | 4/12/2010 | 30 | 5/12/2010 |
| 1 | A | 8/20/2010 | 30 | 9/19/2010 |
| 1 | B | 5/6/2009 | 30 | 6/5/2009 |
+----+------+-----------+-----------+-----------+
Here the three A prescriptions are over lapping .
So using the code in the link gives me combinations like A-A-B
whereas I don't want that.
However I want to account for the overlapping days for drug A. So I want to shift the second row prescription to 3/20/2010 to 4/19/2010. Similarly for 3rd A prescription.
the code I have tried -
data have2;
set have_sorted1;
format NEW_START_DT NEW_END_DT _lagEND_DT date9.;
_lagID = lag(patient_ID);
_lagDRUG = lag(drg_cls);
_lagEND_DT = lag(rx_ed_dt);
if patient_ID = _lagID and drg_cls= _lagDRUG and rx_st_dt <= _lagEND_DT then flag=1;
else flag = 0;
retain NEW_START_DT NEW_END_DT;
if flag=0 then do;
NEW_START_DT = rx_st_dt;
NEW_END_DT = rx_ed_dt;
end;
else do;
New_start_dt = NEW_End_DT + 1;
NEW_END_DT = new_start_dt + DAY_SUPP ;
end;
/* drop flag _:;*/
run;
But even then I get incorrect result -
id Drug drug_start day_supp drug_end New_start New_end
15 A 6-Sep-15 30 5-Oct-15 6-Sep-15 5-Oct-15
15 A 24-Sep-15 90 22-Dec-15 6-Oct-15 4-Jan-16
15 A 6-Dec-15 90 4-Mar-16 5-Jan-16 4-Apr-16
15 A 26-Feb-16 90 25-May-16 5-Apr-16 4-Jul-16
15 A 29-May-16 90 26-Aug-16 29-May-16 26-Aug-16
15 A 7-Dec-16 90 6-Mar-17 7-Dec-16 6-Mar-17
15 A 17-Feb-17 90 17-May-17 7-Mar-17 5-Jun-17
It might be easier to track the 'flag' state implicitly in a shift variable that tracks how many days to shift forward.
Example:
Shift is always applied, but will be zero when no overlap occurs. The prior end, after computation, is tracked in a retained variable. The code does not need to rely on LAG.
data have;
infile cards firstobs=3 dlm='|';
input ID DRUG: $ START_DT: mmddyy10. DAYS_SUPP END_DT: mmddyy10.;
format start_dt end_dt mmddyy10.;
datalines;
| ID | DRUG | START_DT | DAYS_SUPP | END_DT |
+----+------+-----------+-----------+-----------+
| 1 | A | 2/17/2010 | 30 | 3/19/2010 |
| 1 | A | 3/17/2010 | 30 | 4/16/2010 |
| 1 | A | 4/12/2010 | 30 | 5/12/2010 |
| 1 | A | 8/20/2010 | 30 | 9/19/2010 |
| 1 | B | 5/6/2009 | 30 | 6/5/2009 |
;
data want;
set have;
by id drug;
retain shift prior_shifted_end;
select;
when (first.drug) shift = 0;
when (prior_shifted_end > start_dt) shift = prior_shifted_end - start_dt + 1;
otherwise shift = 0;
end;
original_start_dt = start_dt;
original_end_dt = end_dt;
start_dt + shift;
end_dt + shift;
prior_shifted_end = end_dt;
format prior: original: mmddyy10.;
run;

How to split a combination of numeric and characters into multiple columns

I want to split some variable "15to16" into two columns where for that row I want the values 15 and 16 in each of the column entries. Hence, I want to get from this
+-------------+
| change |
+-------------+
| 15to16 |
| 9to8 |
| 6to5 |
| 10to16 |
+-------------+
this
+-------------+-----------+-----------+
| change | from | to |
+-------------+-----------+-----------+
| 15to16 | 15 | 16 |
| 9to8 | 9 | 8 |
| 6to5 | 6 | 5 |
| 10to16 | 10 | 16 |
+-------------+-----------+-----------+
Could someone help me out? Thanks in advance!
data have;
input change $;
cards;
15to16
9to8
6to5
10to16
;
run;
data want;
set have;
from = input(scan(change,1,'to'), 8.);
to = input(scan(change,2,'to'), 8.);
run;
N.B. in this case the scan function is using both t and o as separate delimiters, rather than looking for the word to. This approach still works because scan by default treats multiple consecutive delimiters as a single delimiter.
Regular expressions with the metacharacter () define groups whose contents can be retrieved from capture buffers with PRXPOSN. The capture buffers retrieved in this case would be one or more consecutive decimals (\d+) and converted to a numeric value with INPUT
data have;
input change $20.; datalines;
15to16
9to8
6to5
10to16
run;
data want;
set have;
rx = prxparse('/^\s*(\d+)\s*to\s*(\d+)\s*$/');
if prxmatch (rx, change) then do;
from = input(prxposn(rx,1,change), 12.);
to = input(prxposn(rx,2,change), 12.);
end;
drop rx;
run;
You can get the answer you want by declaring delimiter when you create the dataset. However you did not provide enough information regarding your other variables and how you import them
Data want;
INFILE datalines DELIMITER='to';
INPUT from to;
datalines;
15to16
9to8
6to5
10to16
;
Run;

Compare Value of Current Observation with First Observation

I have a set of multiple choice responses from a survey with 45 questions, and I've placed the correct responses as my first observation in the dataset.
In my DATA step I would like to set values to 0 or 1depending on whether the variable in each observation matches the same variable in the first observation, I want to replace the response letter (A-D) with the 0 or 1 in the dataset, how do I go about doing that comparison?
I'm not doing any grouping, so I believe I can access the first row using First.x, but I'm not sure how to compare that across each variable(answer1-answer45).
| Id | answer1 | answer2 | ...through answer 45
|:-------------|---------:|
| KEY | A | B |
| 2 | A | C |
| 3 | C | D |
| 4 | A | B |
| 5 | D | C |
| 6 | B | B |
Should become:
| Id | answer1 | answer2 | ...through answer 45
|:-------------|---------:|
| KEY | A | B |
| 2 | 1 | 0 |
| 3 | 0 | 0 |
| 4 | 1 | 1 |
| 5 | 0 | 0 |
| 6 | 0 | 1 |
Current code for reading in the data:
DATA TEST(drop=name fill answer0);
INFILE SCORES DSD firstobs=2;
length id $4;
length answer1-answer150 $1;
INPUT name $ fill id $ (answer0-answer150) ($);
RUN;
Thanks in advance!
Here's how I might do it. Create a data set to PROC COMPARE the KEY to the observed. Then you have X for not matching key and missing for matched. You can then use PROC TRANSREG to score the 'X.' to 01. PROC TRANSREG also creates macro variables which contain the names of the new variables and the number.
From log NOTE: _TRGINDN=2 _TRGIND=answer1D answer2D
data questions;
input id:$3. (answer1-answer2)(:$1.);
cards;
KEY A B
2 A C
3 C D
4 A B
5 D C
6 B B
;;;;
run;
data key;
if _n_ eq 1 then set questions(obs=1);
set questions(keep=id firstobs=2);
run;
proc compare base=key compare=questions(firstobs=2) out=comp outdiff noprint;
id id;
run;
options validvarname=v7;
proc transreg design data=comp(drop=_type_ type=data);
id id;
model class(answer:) / noint;
output out=scored(drop=intercept _:);
run;
%put NOTE: &=_TRGINDN &=_TRGIND;
I don't have my SAS license here at home, so I can't actually test this code. I'll give it me best shot, though ...
First, I'd keep my correct answers in a separate table, and then merge it with the answers from the respondents. That also makes the solution scalable, should you have more multiple choice solutions and answers in the same table, since you'd be joining on the assignment ID as well.
Now, import all your correct answers to a table answers_correct with column names answer_correct1-answer_correct45.
Then, merge the two tables and determine the outcome for each question.
DATA outcome;
MERGE answers answers_correct;
* We will not be using any BY.;
* If you later add more questionnaires, merge BY the questionnaire ID;
ARRAY answer(*) answer1-answer45;
ARRAY answer_correct(*) answer_correct1-answer_correct45;
LENGTH result1-result45 $1;
ARRAY result(*) result1-result45;
DROP i;
FOR i = 1 TO DIM(answer);
IF answer(i) = answer_correct(i) THEN result(i) = '1';
ELSE result(i) = '0';
END;
RUN;