Moving variable to next column in SAS - sas

I'm having the following table in SAS
SAS Table: Price
ID Description Price Discount
20 Hot blue warm 12.0
21 Durable A 15.0 0
22 Flexible 13.5 0
23 Bendable and A 12.3
I'm planning to move 'warm' and 'and A' from Price column to Description column while '12.0' and '12.3' to Price, what should I do?

You cannot change the type of an existing variable, but you can change the name.
Use the INPUT() function to convert strings to numbers. You can use the ?? modifier to suppress errors generated by strings that do not represent numbers.
data want;
set price(rename=(price=char_price));
price = input(char_price,??32.)
if missing(price) then description=catx(' ',description,char_price);
run;

Related

Calculate % of row total in SAS visual Analytics

I am trying to create a table which has two categories - X an Y. I am trying to create a table in SAS visual analytics that tells me the share of total in each category. My table looks something like this
Category A
Catgeoy B
Total
40%
60%
100%
I was trying to follow the below link but unfortunately my version of SAS VA does not have Aggregated measure ( tabular) option in it so I do not know how can I proceed forward with it.
How can i go about creating one without the aggregated tabular option
https://communities.sas.com/t5/SAS-Communities-Library/SAS-Visual-Analytics-Report-Example-Percent-of-Total-For-All-For/ta-p/636030
To do this in VA 7.5, we'll use a Crosstab object, a transposed form of your data, and use the "Percent of row total" calculation option within the crosstab. Let's use the below data for our example:
data have;
input id x y;
datalines;
1 40 60
2 30 70
3 90 10
;
run;
Step 1: Transpose to long and create by-groups
Transpose your data so that it is in a long format, then load it and register it to LASR.
proc transpose data = have
out = want(rename=(COL1 = value))
name = category
;
by id;
var x y;
run;
Output:
id category value
1 x 40
1 y 60
2 x 30
2 y 70
3 x 90
3 y 10
Step 2: Create a crosstab
Change id to a category, then create a crosstab that looks like this:
Columns: category
Rows: id
Measures: value
Go to Options --> Scroll to the bottom --> expand "Totals and Subtotals," and Enable "Totals" for rows and set the Placement to "After."
Step 3: Create a row-level Percent Calculation
Right-click the header value within the table and select "Create and add calculation...".
Select "Percent of row total - Sum" under the "Type" drop-down menu.
Remove Value as a role from the crosstab graph, format Percent to have 0 decimal places, and you'll have a table with row-wise percentages.

Summing a Column By Group In a Dataset With Macros

I have a dataset that looks like:
Month Cost_Center Account Actual Annual_Budget
June 53410 Postage 13 234
June 53420 Postage 0 432
June 53430 Postage 48 643
June 53440 Postage 0 917
June 53710 Postage 92 662
June 53410 Phone 73 267
June 53420 Phone 103 669
June 53430 Phone 90 763
...
I would like to first sum the Actual and Annual columns, respectively and then create a variable where it flags if the Actual extrapolated for the entire year is greater than than Annual column.
I have the following code:
Data Test;
set Combined;
%All_CC; /*MACRO TO INCLUDE ALL COST CENTERS*/
%Total_Other_Expenses;/*MACRO TO INCLUDE SPECIFIC Account Descriptions*/
Sum_Actual = sum(Actual);
Sum_Annual = sum(Annual_Budget);
Run_Rate = Sum_Actual*12;
if Run_Rate > Sum_Annual then Over_Budget_Alarm = 1;
run;
However, when I run this code, it does not sum by group, for example, this is the output I get:
Account_Description Sum_Actual Sum_Annual Run_Rate Over_Budget_Alarm
Postage 13 234 146
Postage 0 432 0
Postage 48 643 963 1
Postage 0 917 0
Postage 92 662 634 1
I'm looking for output where all the 'postage' are summed for Actual and Annual, leaving just one row of data.
Use PROC MEANS to summarize the data
Use a data step and IF/THEN statement to create your flags.
proc means data=have N SUM NWAY STACKODS;
class account;
var amount annual_budget;
ods output summary = summary_stats1;
output out = summary_stats2 N = SUM= / AUTONAME;
run;
data want;
set summary_stats;
if sum_actual > sum_annual_budget then flag=1;
else flag=0;
run;
SAS DATA step behavior is quite complex ("About DATA Step Execution" in SAS Language Reference: Concepts). The default behavior, that you're seeing, is: at the end of each iteration (i.e. for each input row) the row is written to the output data set, and the PDV - all data step variables - is reset.
You can't expect to write Base SAS "intuitively" without spending a few days learning it first, so I recommend using PROC SQL, unless you have a reason not to.
If you really want to aggregate in data step, you have to use something called BY groups processing: after ensuring the input data set is sorted by the BY vars, you can use something like the following:
data Test (keep = Month Account Sum_Actual Sum_Annual /*...your Run_Rate and Over_Budget_Alarm...*/);
set Combined; /* the input table */
by Month Account; /* must be sorted by these */
retain Sum_Actual Sum_Annual; /* don't clobber for each input row */
if first.account then do; /* instead do it manually for each group */
Sum_Actual = 0;
Sum_Annual = 0;
end;
/* accumulate the values from each row */
Sum_Actual = sum(Sum_Actual, Actual);
Sum_Annual = sum(Sum_Annual, Annual_Budget);
/* Note that Sum_Actual = Sum_Actual+Actual; will not work if any of the input values is 'missing'. */
if last.account then do;
/* The group has been processed.
Do any additional processing for the group as a whole, e.g.
calculate Over_Budget_Alarm. */
output; /* write one output row per group */
end;
run;
Proc SQL can be very effective for understanding aggregate data examination. With out seeing what the macros do, I would say perform the run rate checks after outputting data set test.
You don't show rows for other months, but I must presume the annual_budget values are constant across all months -- if so, I don't see a reason to ever sum annual_budget; comparing anything to sum(annual_budget) is probably at the incorrect time scale and not useful.
From the show data its hard to tell if you want to know any of these
which (or if some) months had a run_rate that exceeded the annual_budget
which (or if some) months run_rate exceeded the balance of annual_budget (i.e. the annual_budget less the prior months expenditure)
Presume each row in test is for a single year/month/costCenter/account -- if not the underlying data would have to be aggregated to that level.
Proc SQL;
* retrieve presumed constant annual_budget values from data;
* this information might (should) already exist in another table;
* presume constant annual budget value at each cost center | account combination;
* distinct because there are multiple months with the same info;
create table annual_budgets as
select distinct Cost_Center, Account, Annual_Budget
from test;
create table account_budgets as
select account, sum(annual_budget) as annual_budget
from annual_budgets
group by account;
* flag for some run rate condition;
create table annual_budget_mon_runrate_check as
select
2019 as year,
account,
sum(actual) as yr_actual, /* across all month/cost center */
min (
select annual_budget from account_budgets as inner
where inner.account = outer.account
) as account_budget,
max (
case when actual * 12 > annual_budget then 1 else 0 end
) as
excessive_runrate_flag label="At least one month had a cost center run rate that would exceed its annual_budget")
from
test as outer
group by
year, account;
You can add a where clause to restrict the accounts processed.
Changing the max to sum in the flag computation would return the number of cost center months with excessive run rates.

xline option when date is formatted %th?

I'm doing a connected twoway plot with x-axis as dates formatted as %th with values 2011h1 to 2017h2. I want to put a vertical line at 2016h2 but nothing I've tried has worked.
xline(2016h2)
xline("2016h2")
xline(date==2016h2)
xline(date=="2016h2")
I'm thinking it might be because I formatted dates with
gen date = yh(year, half)
format date %th
I think this is a MWE:
age1820 date
10.42 2011h1
10.33 2011h2
11.66 2012h1
11.01 2012h2
14.29 2013h1
10.95 2013h2
12.42 2014h1
7.04 2014h2
7.07 2015h1
6.95 2015h2
4 2016h1
8.07 2016h2
5.98 2017h1
3.19 2017h2
graph twoway connected age1820 date, xline(2016h2)
Your example will not really work as written without some additional work. I think in future posts you may want to shoot for a fully working example to maximize the chance that you get a good answer quickly. This is why I made up some fake data below.
Try something like this:
clear
set obs 20
gen date = _n + 100
format date %th
gen age = _n*2
display %th 116
display %th 117
tw connected age date, xline(116 `=th(2018h2)') tline(2019h1)
The crux of the matter is that Stata deals with dates as integers that have a special label attached to them by the format command (but not a value label). For example, 0 corresponds to 1960h1. In other words, you need to either:
tell xline() the number that corresponds to the date you want
use th() to figure out what that number is and force the evaluation inside xline().
use tline(), which is smart enough to understand dates.
I think the third is the best option.

SAS - get maximum of a variable and store as a column in the dataset

Say, for example, I have the following data
Dataset name: TheTable
Name Age Height
Peter 21 1.6
Alexa 19 1.8
Rob 23 1.3
I want to get the largest age and height, and store them in the table as follows
Dataset name: TheTableWithMax
Name Age Height MaxAge maxHeight
Peter 21 1.6 23 1.8
Alexa 19 1.8 23 1.8
Rob 23 1.3 23 1.8
The reason for this is that I have to draw a comparison between each variable and the maximum of that variable. How can I go about doing that in SAS?
I have thought of doing the following in order to obtain the max of, say, the Age column (the same process will be done for the Height column)
proc sort data = TheTable (obs=1) out=MaxAge (keep = Age);
by descending Age;
run;
However, I am not sure how I will be able to merge that back with the original TheTable to get TheTablewithMax.
Any assistance will be much appreciated.
This is very easily done with proc sql and summary functions, in your case max():
proc sql;
create table TheTableWithMax as
select *
,max(age) as MaxAge
,max(height) as MaxHeight
from TheTable
;
quit;

Modifying data in SAS: copying part of the value of a cell, adding missing data and labeling it

I have three different questions about modifying a dataset in SAS. My data contains: the day and the specific number belonging to the tag which was registred by an antenna on a specific day.
I have three separate questions:
1) The tag numbers are continuous and range from 1 to 560. Can I easily add numbers within this range which have not been registred on a specific day. So, if 160-280 is not registered for 23-May and 40-190 for 24-May to add these non-registered numbers only for that specific day? (The non registered numbers are much more scattered and for a dataset encompassing a few weeks to much to do by hand).
2) Furthermore, I want to make a new variable saying a tag has been registered (1) or not (0). Would it work to make this variable and set it to 1, then add the missing variables and (assuming the new variable is not set for the new number) set the missing values to 0.
3) the last question would be in regard to the format of the registered numbers which is along the line of 528 000000000400 and 000 000000000054. I am only interested in the last three digits of the number and want to remove the others. If I could add the missing numbers I could make a new variable after the data has been sorted by date and the original transponder code but otherwise what would you suggest?
I would love some suggestions and thank you in advance.
I am inventing some data here, I hope I got your questions right.
data chickens;
do tag=1 to 560;
output;
end;
run;
data registered;
input date mmddyy8. antenna tag;
format date date7.;
datalines;
01012014 1 1
01012014 1 2
01012014 1 6
01012014 1 8
01022014 1 1
01022014 1 2
01022014 1 7
01022014 1 9
01012014 2 2
01012014 2 3
01012014 2 4
01012014 2 7
01022014 2 4
01022014 2 5
01022014 2 8
01022014 2 9
;
run;
proc sql;
create table dates as
select distinct date, antenna
from registered;
create table DatesChickens as
select date, antenna, tag
from dates, chickens
order by date, antenna, tag;
quit;
proc sort data=registered;
by date antenna tag;
run;
data registered;
merge registered(in=INR) DatesChickens;
by date antenna tag;
Registered=INR;
run;
data registeredNumbers;
input Numbers $16.;
datalines;
528 000000000400
000 000000000054
;
run;
data registeredNumbers;
set registeredNumbers;
NewNumbers=substr(Numbers,14);
run;
I do not know SAS, but here is how I would do it in SQL - may give you an idea of how to start.
1 - Birds that have not registered through pophole that day
SELECT b.BirdId
FROM Birds b
WHERE NOT EXISTS
(SELECT 1 FROM Pophole_Visits p WHERE b.BirdId = p.BirdId AND p.date = ????)
2 - Birds registered through pophole
If you have a dataset with pophole data you can query that to find if a bird has been through. What would you flag be doing - finding a bird that has never been through any popholes? Looking for dodgy sensor tags or dead birds?
3 - Data code
You might have more joy with the SUBSTRING function
Good luck