SAS PHREG test for trend? - sas

I am running a PHREG model with mortality as the outcome and quartiles of protein X as my predictor (where quartile 1= ref). I was able to get the HR for quartile 2, 3, and 4, and their respective p-values- which are not significant (0.8, 0.4, 0.2) and now wondering if there is a way to test for a trend in the HRs?? Any suggestions/tips for the analysis would be appreciated!

Related

Maximize sum of (n-combination) rows with constraint

I have a table with 3 columns: Product, Cost, Revenue.
There are 1000 rows (1000 distinct Products).
My budget equals B.
I want to filter 3 Products so that I maximize the Revenue sum for those 3 products, provided that the total Cost for those 3 Products are less than my budget B.
Can this be done in Power BI?
I would appreciate any kind of help.
Thank you in advance.
Best Regards.
As you didn't share any sets of data, I created a scenario with datasets. I hope It is similar to your case:
Please note that if you want to paste the code into your Power BI Desktop, please remember to remove "evaluate" statement. Also Table functions require you to create a new table to see the results unless you plan to use them as table expressions in your DAX code.
First create a table:
DEFINE
TABLE FactTable =
SELECTCOLUMNS ({
(102, 12.5, 14.5),
(103, 20.5, 21.5),
(104, 12.5, 13.0),
(105, 23.5, 25.5),
(106, 12.0, 15.0),
(107, 10.0, 14.0),
(108, 16.5, 18.5),
(109, 25.0, 28.0),
(110, 12.5, 18.0),
(111, 14.5, 16.5),
(112, 15.5, 17.5)
},
"Product",[Value1],
"Cost", [Value2],
"Revenue",[Value3])
VAR MyBudget = 50.0
EVALUATE
FactTable
Now let's make some tests:
Most profitable products are the ones where [Revenue] - [Cost] are the greatest. Lets find out which products they are:
Code to test Top3 highest [Revenue] - [Cost] values:
TOPN (3,FactTable,[Revenue]- [Cost])
Note: Ties are included. That's why there are 4 rows.
Now let's find the most costly products:
TOPN (3,FactTable,[Cost])
Now It is time to build filter context of calculate. I mean we need to find the products where [Revenue] - [Cost] is greatest, and [Cost] is among the lowest sets of values. We will use EXCEPT() function in DAX.
EXCEPT(
TOPN (3,FactTable,[Revenue]- [Cost]),
TOPN (3,FactTable,[Cost]))
Let's test now if top 3 [Cost] values are less than my budget.(Here we are checking it against our variable "MyBudget= 50")
EVALUATE
ROW("Top3 Total Less Than My Budget",SUMX(
EXCEPT(
TOPN (3,FactTable,[Revenue]- [Cost]),
TOPN (3,FactTable,[Cost])),
[Cost]
)<= MyBudget)
And finally, since our test result is "True", we can write our final code to calculate the top3 revenue total using calculate function:
EVALUATE
ROW("Top3 Total Revenue", CALCULATE(SUM(FactTable[Revenue]),
EXCEPT(
TOPN (3,FactTable,[Revenue]- [Cost]),
TOPN (3,FactTable,[Cost]))
))
Comparison:
I hope It can be helpful for you.

Power BI - Weekly X-Axis without year, but still sorted chronologically

since I started to use Power BI for my weekly reports, I was not able to resolve the following issue.
Most of my charts are on a weekly basis, so for example a chart of the last 13 weeks will have the following weeks as x-axis values: 47, 48, 49, 50, 51, 52, 1, 2, 3, 4, 5, 6, and 7.
The problem in this case is, that if I sort it by week, Power BI will always sort the values like this: 1, 2, 3, 4, 5, 6, 7, 47, 48, 49, 50, 51, 52. This is obviously wrong because week 1 of the following year should be placed after week 52 of the previous year.
So I came to the following solution: I concatenated the year value with the week value, so the values look like this: ..., 202151, 202152, 202201, 202202, ...
This solved at least my sorting issue. The graphs are displayed in the correct chronological order.
But there is one problem with this solution: The six-digit values are just way to huge and it is very hard for the reader to differ between the weeks.
too large markers
I would like to have only the 2-digit week number displayed on the x-axis, but still keep the sorting properly over the change of a year.
I hope someone has solved this. Thank you in advance!
Configure YearWeek as the "sort by column" for WeekNumber, and switch the X-Axis type to "Categorical" on your Bar Chart
or Line Chart
Perhaps you can use small multiples:
Rough steps:
Create a columns chart
In the Axis field place your week number column
In the Values field place your column containing the values
In the Small multiples field place the year column
Format your visual such that the Small multiple grid has 1 row, but multiple columns.

Why are my zero values returning as the maximum value in my running total in PowerBI?

I am a novice user of PowerBi, so please bear with me. I have a table that is to return running totals in a descending order. As such, you would expect the values to continuously decrease as you move down the table. I noticed one column didn't follow that, so I check the raw data and those final two rows should have actually returned a zero. Below is the quick measure that I generated to get these values.
Count of Candidate running total in Last Recruiting Stage =
CALCULATE(
COUNTA('Sheet1'[Candidate]),
FILTER(
ALLSELECTED('Sheet1'[Last Recruiting Stage]),
ISONORAFTER('Sheet1'[Last Recruiting Stage], MIN('Sheet1'[Last Recruiting Stage]), ASC)
))
Below is the table that is generated. The column to note is the Veterans at each stage. It should go (from top to bottom) 5, 5, 5, 3, 0, 0.

Wilcoxon Z score is negative when it should be positive and vice versa

SAS Coding: - I perform a ttest on the differences in two groups (independent but from same population). The signs of the 'difference' amount and the t-stat match (i.e. mathematical difference between the two groups is negative and tstat is negative. Or if mathematical difference between the two groups is positive the tstat is positive).
However, when I run a wilcoxon rank sum test, the signs of my z-scores don't match the sign (-/+) of the group difference. (i.e. mathematical difference between the two groups is negative but z-score is positive. If mathematical difference between the two groups is positive the z-score is negative).
I have tried sorting the dataset regular and descending.
Here's my code:
*proc sort data = fundawin3t;
by vb_nvb_TTest;
run;
**Wilcoxon rank sums for vb vs nvb firms.;
proc npar1way data = fundawin3t wilcoxon;
title "NVB vs VB univariate tests and Wilcoxon-Table 4";
var ma_score_2015 age mve roa BM BHAR prcc_f CFI CFF momen6 vb_nvb SERIAL recyc_v;
class vb_nvb_TTest;
run;
Here is my log:
3208
3209 proc sort data = fundawin3t;
3210 by vb_nvb_TTest;
3211 run;
NOTE: Input data set is already sorted, no sorting done.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
3212
3213 **Wilcoxon rank sums for vb vs nvb firms.;
3214 proc npar1way data = fundawin3t wilcoxon;
3215 title "NVB vs VB univariate tests and Wilcoxon-Table 4";
3216 var ma_score_2015 age mve roa BM BHAR prcc_f CFI CFF momen6
tenure vb_nvb SERIAL
3216! recyc_v;
3217 class vb_nvb_TTest;
3218 run;
NOTE: PROCEDURE NPAR1WAY used (Total process time):
real time 6.59 seconds
cpu time 5.25 seconds
RTM
To compute the linear rank statistic S, PROC NPAR1WAY sums the scores of the observations in the smaller of the two samples. If both samples have the same number of observations, PROC NPAR1WAY sums those scores for the sample that appears first in the input data set.
PROC NPAR1WAY computes one-sided and two-sided asymptotic p-values for each two-sample linear rank test. When the test statistic z is greater than its null hypothesis expected value of 0, PROC NPAR1WAY computes the right-sided p-value, which is the probability of a larger value of the statistic occurring under the null hypothesis. When the test statistic is less than or equal to 0, PROC NPAR1WAY computes the left-sided p-value, which is the probability of a smaller value of the statistic occurring under the null hypothesis. The one-sided p-value $P_1(z)$ can be expressed as

Plotting log odds against mid-point of category

I have a binary outcome variable (disease) and a continuous independent variable (age). There's also a cluster variable clustvar. Logistic regression assumes that the log odds is linear with respect to the continuous variable. To visualize this, I can categorize age as (for example, 0 to <5, 5 to <15, 15 to <30, 30 to <50 and 50+) and then plot the log odds against the category number using:
logistic disease i.agecat, vce(cluster clustvar)
margins agecat, predict(xb)
marginsplot
However, since the categories are not equal width, it would be better to plot the log odds against the mid-point of the categories. Is there any way that I can manually define that the values plotted on the x-axis by marginsplot should be 2.5, 10, 22.5, 40 and (slightly arbitrarily) 60, and have the points spaced appropriately?
If anyone is interested, I achieved the required graph as follows:
Recategorised age variable slightly differently using (integer) labels that represent the mid-point of the category:
gen agecat = .
replace agecat = 3 if age<6
replace agecat = 11 if age>=6 & age<16
replace agecat = 23 if age>=16 & age<30
replace agecat = 40 if age>=30 & age<50
replace agecat = 60 if age>=50 & age<.
For labelling purposes, created a label:
label define agecat 3 "Less than 5y" 11 "10 to 15y" 23 "15 to <30y" 40 "30 to <50y" 60 "Over 50 years"
label values agecat
Ran logistic regression as above:
logistic disease i.agecat, vce(cluster clustvar)
Used margins and plot using marginsplot:
margins agecat, predict(xb)
marginsplot