Replacing values of a column with its minimum value in sas - sas

i am very new to sas and I have the following work table
I want to create a new table in which column Date and Z remain the same, but all values in column X are replaced with the minimum value in column X and all values in column Y are replaced with the minimum value in column y.
Sample output is as follows

You can use the fact that PROC SQL will automatically remerge aggregate statistics back onto detail observations.
proc sql;
create table want as
select date, x, min(y) as y, min(z) as z
from have
;
quit;

If you don't want to use proc sql statement you can modify this code found from https://blogs.sas.com/content/iml/2014/12/01/max-and-min-rows-and-cols.html
data MinMaxRows;
set sashelp.Iris;
array x {*} _numeric_; /* x[1] is 1st var,...,x[4] is 4th var */
min = min(of x[*]); /* min value for this observation */
max = max(of x[*]); /* max value for this observation */
run;
proc print data=MinMaxRows(obs=7);
var _numeric_;
run;

Related

Cummulative sum of variable by a condition and ID on sas

I am trying to sum one variable as long as another remains constant. I want to cumulative sum dur as long as a is constant. when a changes the sum restarts. when a new id, the sum restarts.
enter image description here
and I would like to do this:
enter image description here
Thanks
You can use a BY statement to specify the variables whose different value combinations organize data rows into groups. You are resetting an accumulated value at the start of each group and adding to the accumulator at each row in the group. Use retain to maintain a new variables value between the DATA step implicit loop iterations. The SUM statement is a unique SAS feature for accumulating and retaining.
Example:
data want;
set have;
by id a;
if first.a then mysum = 0;
mysum + dur;
run;
The SUM statement is different than the SUM function.
<variable> + <expression>; * SUM statement, unique to SAS (not found in other languages);
can be thought of as
retain <variable>;
<variable> = sum (<variable>, <expression>);
As far as I am concerned, you need to self-join your table with a ranked column.
It should be ranked by id and a columns.
FROM WORK.QUERY_FOR_STCKOVRFLW t1; is the table you provided in the screenshot
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_STCKOVRFLW_0001 AS
SELECT t1.id,
t1.a,
t1.dur,
/* mono */
(monotonic()) AS mono
FROM WORK.QUERY_FOR_STCKOVRFLW t1;
QUIT;
PROC SORT
DATA=WORK.QUERY_FOR_STCKOVRFLW_0001
OUT=WORK.SORTTempTableSorted
;
BY id a;
RUN;
PROC RANK DATA = WORK.SORTTempTableSorted
TIES=MEAN
OUT=WORK.RANKRanked(LABEL="Rank Analysis for WORK.QUERY_FOR_STCKOVRFLW_0001");
BY id a;
VAR mono;
RANKS rank_mono ;
RUN; QUIT;
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_RANKRANKED AS
SELECT t1.id,
t1.a,
t1.dur,
/* SUM_of_dur */
(SUM(t2.dur)) FORMAT=BEST12. AS SUM_of_dur
FROM WORK.RANKRANKED t1
LEFT JOIN WORK.RANKRANKED t2 ON (t1.id = t2.id) AND (t1.a = t2.a AND (t1.rank_mono >= t2.rank_mono ))
GROUP BY t1.id,
t1.a,
t1.dur;
QUIT;

Sort all rows by length of string in variable X (longer strings first)

I have a variable UserName that contains IDs of variable length. A shortened example:
How can I sort all rows by variable X where longer strings are listed first.
Context: This is for calculating HEI 2015 scores using the ASA24 macro. It writes:
/*Note: Some users have found that the SAS program will drop observations from the analysis if the ID field is not the same length for all observations. To prevent this error, the observations with the longest ID length should be listed first when the data is imported into SAS. */
Proc SQL with an ORDER BY clause specifying an ordering value computed in a CASE expression.
The computation when length(X) > 8 then -length(X) else 0 ensures longest values are first when sorted and all value lengths <= some-capping-length (8) are treated equally
ORDER BY length(X) desc, X would also select longest X values first and then by X itself, but length would predominate ordering even when value lengths < 8.
data have;
length X $50;
input X; datalines;
GFHsp036
GFHsp038
GFHsp039
GFHsp040
GFHsp0400
GFHsp0401
GFHsp0402
GFHsp04021
;
proc sql;
create table want as
select * from have
order by
case when length(x) > 8 then -length(X) else 0 end,
X
;
quit;
proc print;
var X / style=[fontfamily='Courier'];
run;
Here is probably the simplest way to do this
data have;
input string $;
datalines;
abcde
ab
a
abcd
abc
;
proc sql;
create table want as
select * from have
order by length(string) desc;
quit;
Re-ordering IDs did not help in my case as PROC IMPORT needed GUESSINGROWS = MAX.
Please see SAS Macro Truncating IDs
For how to fix the truncating IDs that this question attempted to fix.

sas reference cell values of table

I'm trying to reference values that are in a stats table, as such:
/* Calculate Median and IQR */
PROC UNIVARIATE DATA = kddcup98(drop=TARGET_B) OUTTABLE= boxStats(keep=_VAR_ _Q1_ _Q3_ _QRANGE_) NOPRINT;
RUN;
/* Calculate upper and lower bounds */
DATA boxStats;
SET boxStats;
upper_bound = _Q3_ + 1.5*_QRANGE_;
lower_bound = _Q3_ - 1.5*_QRANGE_;
RUN;
DATA kddcup98_continuous;
SET kddcup98_continuous;
ARRAY Num_Col[*] _NUMERIC_;
DO i = 1 to dim(Num_Col);
IF Num_Col[i] > boxStats[i, "upper_bound"] OR Num_Col[i] < boxStats[i, "lower_bound"] THEN Num_Col[i] = .;
END;
RUN;
I have the main data table and a table of stats from which I computed upper and lower bounds. I need to reference those values from the boxStats table. How I can I reference those values?
Use OUTTABLE PROC statement option.
OUTTABLE=SAS-data-set
creates an output data set that contains univariate statistics arranged in tabular form, with one observation per analysis variable. See the section OUTTABLE= Output Data Set for details.

Need to compute column total in SAS and use it as input to calculate another column

Data IV_SAS;
set IV;
Total_Loans=Goods+Bads;
Dist_Loans=Total_Loans/sum(Total_Loans));
Dist_Goods=Goods/Sum(Goods);
Dist_Bads=Bads/Sum(Bads);
Difference=Dist_Goods-Dist_Bads;
WOE=log10(Dist_goods/Dist_Bads);
IV=WOE*Difference;
run;
I am facing issues in calculating sum of (Total Loans),its calculating Row total instead of column total.
That's how Base SAS works - it operates on row level in the data step.
You would want to use PROC MEANS or PROC TABULATE or similar proc and find the column total there, then merge that on (or combine in another method).
For example:
proc means data=sashelp.class;
var age height weight;
output out=class_means sum(age)=age_sum sum(height)=height_sum sum(weight)=weight_Sum;
run;
data class;
if _n_=1 then set class_means;
set sashelp.class;
age_prop = age/age_sum;
height_prop = height/height_sum;
weight_prop = weight/weight_Sum;
run;
Alternately, use SAS/IML or PROC SQL, both of which will operate on the column level when asked inline (though I think the above solution is likely superior in speed to both due to lower overhead).
data a;
input goods bads;
datalines;
36945 33337
23820 21761
26990 24647
33195 30299
43755 39014
46100 41100
89765 79978
25940 23508
35940 32506
31840 28846
33430 30366
34480 31388
36640 33129
39640 35992
42490 38325
44240 40075
42840 38840
49690 44936
69190 64740
;
run;
proc sql;
create table b as
select goods,bads,
sum(goods,bads) as Total_Loans format=dollar10.,
sum(goods)as Column_goods_tot format=dollar10. ,
sum(bads) as Column_bads_tot format=dollar10. ,
sum(calculated Column_goods_tot, calculated Column_bads_tot) as Column_Total_Loans format=dollar10. ,
(calculated Total_Loans/calculated Column_Total_Loans) as Dist_Loans
/*add more code to calculate Dist_Goods, Dist_Bads, etc..*/
from a;
quit;
/*Column totals only*/
proc sql;
create table c as
select
sum(goods)as Column_goods_tot format=dollar10. ,
sum(bads) as Column_bads_tot format=dollar10. ,
sum(calculated Column_goods_tot, calculated Column_bads_tot) as Column_Total_Loans format=dollar12.
from a;
quit;

Using SAS I want to print the 5 rows with the highest value in a certain column

I have a dataset with several columns. I can get the extreme observations for a column or set of columns like this ...
PROC Univariate data = Work.tempVal
nextrobs = 5 ;
ods select ExtremeObs ;
ods output ExtremeObs = ExtremeObs;
var B C;
run;
What I would like to do is print out the dataset row for each one of the extreme observations. So I am getting the column that I am targeting for extremity but I want the rest of the columns as well.
Turns out the id keyword includes other columns
So ...
PROC Univariate data = Work.tempVal
nextrobs = 5 ;
ods select ExtremeObs ;
ods output ExtremeObs = ExtremeObs;
var B;
id A C D;
run;
will return columns A, B, C, and D where B is an extreme observation.
You can use proc sql with the outobs= option, and the appropriate sort order.
For example, to get the rows with the top 5 maximum values :
proc sql outobs=5 ;
create table top5 as
select *
from mydata
order by targetvar descending ;
quit ;
Obviously, if you've got multiple rows with the same maximum value, you may want to use a different approach.