I am attempting to use the PCTN function to create a simple percentage (retention rate) but the code I am using below only outputs 100% in my Retention Rate column across the whole report. If I change to use ROWPCTN or COLPCTN, the script below calculates accordingly. I am only having the issue using PCTN.
PROC TABULATE DATA=TABLE FORMAT=COMMA8.;
FORMAT CCNUM college.;
FORMAT FTPT $PTime.;
VAR FTIC_CNT RET_CNT;
CLASS FTPT / order=data missing preloadfmt;
CLASS CCNUM / order=data missing;
TABLE
/* ROW STATEMENT */
CCNUM = ' ' ALL = 'TOTAL' ,
/* COLUMN STATEMENT */
FTPT = ' ' * (FTIC_CNT = 'FIRST-TIME COHORT' * SUM = ' '
RET_CNT = 'STILL ENROLLED OR COMPLETED' * SUM = ' '
RET_CNT = ' ' * PCTN<FTIC_CNT> = "RETENTION RATE" * F=PERCFMT.) / BOX=_PAGE_ PRINTMISS MISSTEXT='0' ;
RUN;
I've rerun the code using variations of PCTN, ROWPCTN, COLPCTN, but it seems like just the PCTN function is not calculating the way I am expecting it to. I have tried rerunning without formatting, etc. but I still get 100% for each value no matter what I try.your text
Related
I am working on the this SAS code and would need assistance with joining the two tables below. I am getting errors while trying to join the two tables.
Requirement: i. Left Join Table B to Table A
Table A:
PROC SQL;
create table stand as select distinct
put(datepart(Max(a.REPORT_DATE)),Date9.) as M_Date
, a.BUSINESS_GROUP as PORTF_LEVEL1
, A.SPLIT as PORTF_LEv2
, Count(distinct a.Report_Date) as Number_of_Days
, (B.TOTAL_BREACH/Count(distinct a.Report_Date))*100 as FREQ
, A.MINIMUM_ACCEPTABLE_COUNT
, A.MAX_COUNT
, (case WHEN (B.TOTAL_BREACH/Count(distinct a.Report_Date)) * 100 LT MIN_COUNT
THEN 'TRUE' ELSE 'FALSE' END) as NUMBER__UNDER
, (case WHEN (B.TOTAL_BREACH/Count(distinct a.Report_Date)) * 100 GT MAX_COUNT THEN 'TRUE' ELSE 'FALSE' END) as NUMBER__OVER
from temp a
INNER join
( select BUSINESS_GROUP as PORTF_LEVEL1
,SPLIT AS PORTF_LEv2
,Count(distinct c.Report_Date) as Number_of_Days
from temp c
Inner join temp2 d
on c.Report_Date=d.Report_Date
WHERE &Alert and TENOR = '+'
and datepart(c.REPORT_DATE) ge '31-APR-21'd
and datepart(c.REPORT_DATE) le '31-APR-22'd
Group by BUSINESS_GROUP, SPLIT
)B
on a.BUSINESS_GROUP = b.PORTF_LEVEL1
AND a.SPLIT = b.PORTF_LEVEL2
INNER JOIN temp2 e
on a.REPORT_DATE = e.REPORT_DATE
where &Alert and TENOR = '+'
and datepart(a.REPORT_DATE) ge '31-APR-21'd
and datepart(a.REPORT_DATE) le '31-APR-22'd
Group by Business_GROUP, SPLIT
;
QUIT;
Table B:
In the table B, i am trying to find the median of the variable Data_M. The code seems to be okay. I only need assistance joining the Table B to table A above.
Proc sql outobs=1; create table median_dt1 as select distinct put(datepart(max(REPORT_DATE)), date9.) as M_Date , median(Data_M) as median_data from transp
WHERE datepart(REPORT_DATE) ge '01-APR-22'd and datepart(REPORT_DATE) le '31-APR-22'd group by BUSINESS_GROUP order by Report_Date Desc; quit;
Thank you in advance!
sas
from temp a
INNER join
( select BUSINESS_GROUP as PORTF_LEVEL1
,SPLIT AS PORTF_LEv2
,Count(distinct c.Report_Date) as Number_of_Days
from temp c
Inner join temp2 d
on c.Report_Date=d.Report_Date
WHERE &Alert and TENOR = '+'
and datepart(c.REPORT_DATE) ge '31-APR-21'd
and datepart(c.REPORT_DATE) le '31-APR-22'd
Group by BUSINESS_GROUP, SPLIT
)B
on a.BUSINESS_GROUP = b.PORTF_LEVEL1
AND a.SPLIT = b.PORTF_LEVEL2
You're trying to join on b.PORTF_LEVEL2. However, that column doesn't exist in B. The column "PORTF_LEV2" exists, though. Try that?
If that doesn't resolve the issue, please paste the complete error message that you're receiving.
I want to display my data as either yes or no in the output for initaltesting, site visit, and follow up, how would I do that? There are numeric values for this on the data set but want character responses of "y" or "n"
PROC FORMAT;
VALUE SiteVisitfmt 1 = 'yes'
0 = 'no';
VALUE InitialTestingfmt 1 = 'yes'
2 = 'no';
VALUE TestEventfmt 1 = 'One Event '
2 = 'Two Events'
3 = 'Three Events'
4 = 'Four Events'
5 = 'Five Events';
VALUE FollowUpfmt 1 = 'yes'
0 = 'no';
FORMAT SiteVisit SiteVisitfmt. InitialTesting InitialTestingfmt. TestEvent TestEventfmt.
FollowUp FollowUpfmt.;
RUN;
data PMdataedits;
set PMdata (rename = (Number_of_Days_from_Onset_to_Sit =SiteVisit
Number_of_Days_between_Onset_and = InitialTesting
Number_of_Test_Events_in_IRIS = TestEvent
Number_of_Days_between_Test_1_an = FollowUp));
drop SPA;
attrib date1 format=date9.;
date1=input(date,mmddyy10.);
NewSiteVisit = put(SiteVisit, 8.);
NewInitialTesting = put(InitialTesting, 8.);
NewFollowUp = put(FollowUp, 8.);
NewSiteVisit=;
if (NewSiteVisit=<1) THEN NewSiteVisit= '1';
if (NewSiteVisit>1) THEN NewSiteVist= '0';
NewInitialTesting=;
if (NewInitialTesting<=2) THEN NewInitialTesting= '1';
if (NewInitialTesting>2) THEN NewInitialTesting='0';
This statement:
FORMAT SiteVisit SiteVisitfmt. InitialTesting InitialTestingfmt. TestEvent TestEventfmt.
FollowUp FollowUpfmt.;
Needs to be on the data step (sometime after data PMdataedits; but before the run; that you don't show), not in the proc format. That's the statement that assigns the format to a variable; each dataset (which is defined by a data step) has its own, unique set of variables that can be the same name as other datasets but have different contents and formats.
Also note that you don't have to name the formats after the variables, and don't need three different yes/no formats. You could have done:
proc format;
format ynf
'1'='yes'
'0'='no'
;
run;
And then used
format sitevisit initialtesting followup ynf.;
And that would have covered all three of them with one format. But what you did is legal, it's just more typing than you need!
There is a scenario where I receive a string to the bigquery function and need to use it as a column name.
here is the function
CREATE OR REPLACE FUNCTION METADATA.GET_VALUE(column STRING, row_number int64) AS (
(SELECT column from WORK.temp WHERE rownumber = row_number)
);
When I call this function as select METADATA.GET_VALUE("TXCAMP10",149); I get the value as TXCAMP10 so we can say that it is processed as SELECT "TXCAMP10" from WORK.temp WHERE rownumber = 149 but I need it as SELECT TXCAMP10 from WORK.temp WHERE rownumber = 149 which will return some value from temp table lets suppose the value as A
so ultimately I need value A instead of column name i.e. TXCAMP10.
I tried using execute immediate like execute immediate("SELECT" || column || "from WORK.temp WHERE rownumber =" ||row_number) from this stack overflow post to resolve this issue but turns out I can't use it in a function.
How do I achieve required result?
I don't think you can achieve this result with the help of UDF in standard SQL in BigQuery.
But it is possible to do this with stored procedures in BigQuery and EXECUTE IMMEDIATE statement. Consider this code, which simulates the situation you have:
create or replace table d1.temp(
c1 int64,
c2 int64
);
insert into d1.temp values (1, 1), (2, 2);
create or replace procedure d1.GET_VALUE(column STRING, row_number int64, out result int64)
BEGIN
EXECUTE IMMEDIATE 'SELECT ' || column || ' from d1.temp where c2 = ?' into result using row_number;
END;
BEGIN
DECLARE result_c1 INT64;
call d1.GET_VALUE("c1", 1, result_c1);
select result_c1;
END;
After some research and trial-error methods, I used this workaround to solve this issue. It may not be the best solution when you have too many columns but it surely works.
CREATE OR REPLACE FUNCTION METADATA.GET_VALUE(column STRING, row_number int64) AS (
(SELECT case
when column_name = 'a' then a
when column_name = 'b' then b
when column_name = 'c' then c
when column_name = 'd' then d
when column_name = 'e' then e
end from WORK.temp WHERE rownumber = row_number)
);
And this gives the required results.
Point to note: the number of columns you use in the case statement should be of the same datatype else it won't work
For some reason when SAS does proportional hazards regression it is including those observations that are specified as . as a group in the results. I suspect it has something to do with how I created my variable (and that SAS thinks my numeric variables are characters) but I can't figure out what I did wrong. I am using SAS 9.4
data final; set final;
if edu_d = 'hs less' then edu_regress = 1;
else if edu_d = 'hs' then edu_regress = 1;
else if edu_d = 'some college' then edu_regress = 2;
else if edu_d = 'college plus' then edu_regress = 3;
else if edu_d = 'missing' then edu_regress=.;
run;
Then I run my regression:
proc phreg data=final;
class edu_regress;
model fuptime*dc(0)=edu_regress/rl;
run;
And the output is as follows:
edu_regress . 1 0.10963 0.12941 0.7177 0.3969 1.116 0.866 1.438
edu_regress 1 1 0.22514 0.10949 4.2278 0.0398 1.252 1.011 1.552
edu_regress 2 1 0.21706 0.11410 3.6190 0.0571 1.242 0.993 1.554
Where . is a category instead of treated as missing.
I'm sure I'm making a rookie mistake but I just can't figure it out.
I would clear your output, and re-run the code, and check the log and output.
As I read the docs, to get missing values treated as a category you would need to have /missing on your CLASS statement, which you do not have in the code shown. Without that, I think missing values should be automatically excluded.
When I run PHREG with a CLASS variable that has missing values, I get a note in the log about observations being deleted due to missing values, and the output shows that the number of observations used is less than the number of observations read.
If SAS thinks edu_regress is character, that's possible if it already was on the dataset as character. This is one reason not to do data x; set x; and instead make a new dataset. You should see notes in the datastep when you run it the way you have now regarding numeric to character conversion, if this is indeed the problem.
Anyway, one way to adjust this is to use CALL MISSING. It sets a variable to missing correctly regardless of the type.
data final;
set final;
if edu_d = 'hs less' then edu_regress = 1;
else if edu_d = 'hs' then edu_regress = 1;
else if edu_d = 'some college' then edu_regress = 2;
else if edu_d = 'college plus' then edu_regress = 3;
else if edu_d = 'missing' then call missing(edu_Regress);
run;
There are four variables in my dataset. Company shows the company's name. Return is the return of Company at day Date. Weight is the weight of this company in the market.
I want to keep all variables in the original file, and create an additional variable which is the market return (exclude Company itself). Market return corresponding for stock 'a' is the sum of all weighted stocks' return at the same Date in the market exclude stock a. For example, if there are 3 stocks in the market a, b and c. Market Return for stock a is Return(b)* [Weight(b)/(weight(b)+weight(C))] + Return(C)* [weight(C)/(weight(b)+weight(C)]. Similarly, Market Return for stock b is Return(a)* [Weight(a)/(weight(a)+weight(C))] + Return(C)* [weight(C)/(weight(a)+weight(C)].
I try to use proc summary but this function cannot exclude stock a when calculate the market return for stock a.
PROC SUMMARY NWAY DATA ;
CLASS Date ;
VAR Return / WEIGHT = weight;
OUTPUT
OUT = output
MEAN (Return) = MarketReturn;
RUN;
Could anyone teach me how to solve this please. I am relatively new to this software, so I dont know if I should use loop or there might be some better alternative.
This can be done with a bit of fancy algebra. It's not something that's built-in, though.
Basically:
Construct a "total" market return
Construct a stock by stock return (so just return of A)
Subtract out the portion that A contributes to total.
Thanks to the simple math that generates these lists, it's quite easy to do this.
Total sum = ((mean of A*Awgt) + (mean of remainder*sum of their weights))/(sum of Awgt + sum of rest wgts)
So, solve that for (mean of rest*mean of rest wgts / sum of rest wgts).
Exclusive sum: ((mean of all * sum of all wgts) - (mean of A * sum of A wgts)) / (sum of all wgts - sum of A wgts)
Something like this.
data returns;
input stock $ return weight;
datalines;
A .50 1
B .75 2
C .33 1
;;;;
run;
proc means data=returns;
class stock;
types () stock; *this is the default;
weight weight;
output out=means_out mean= sumwgt= /autoname;
run;
data returns_excl;
if _n_=1 then set means_out(where=(_type_=0) rename=(return_mean=tot_return return_sumwgt=tot_wgts));
set means_out(where=(_type_=1));
return_excl = (tot_return*tot_wgts-return_mean*return_sumwgt)/(tot_wgts-return_sumwgt);
run;