I am trying to plot SAS line plot with X- axis as Hour ( 0 , 1, 2...24) and Y axis is Decline Rate.
I started my monitoring at Hour = 20 (8PM) . I need to plot the line plot starting with 20.
When it goes to 0 , the line joins o to 20 forming a straight line.
How can i handle this in SAS. I am using PROC GPLOT
This is a difficulty for SAS, but one that can be managed.
I have two solutions:
Solution 1)
Keep the hour as a column in the data, but also add a date/time field to denote that the time is always increasing.
Use the date/time field and Decline Rate within gplot, but format the date/time field to only show hours.
Solution 2)
Add a new column to denote the order
data temp;
set temp;
order = _n_;
run;
Then sort by the new variable.
proc sort data=temp; by order; run;
Finally, utilize the sorted option within gplot. See the attached link for further information: http://www.math.wpi.edu/saspdf/gref/c21.pdf
Related
I am building a process in SAS EG and came to a sticking point when I needed a running total. This would be very easy to do in Excel but my table is 22M records long. I have VBA experience but not Proc SQL. Can someone show me how to do a running total of dollars by item? The data is sorted by Market/Segment/Item/Month.
Thanks
Jeff
MyData
You hierarchy is Market / Segment / Item, and maybe from the question one can presume an Item is unique across all Markets and Segments.
A running total is easiest in a DATA Step. You will want to use first. automatic variables that are prepared when the step has a BY statement.
data want;
set have;
by Market Segment Item Month; * add month to make sure incoming data is ordered timewise, if not an error will appear in the log;
if first.Item then RunningDollars = 0;
RunningDollars + Dollars; * The + syntax here is a `SUM` statement that causes the RunningDollars variable to be automatically retaine, meaning the value is available for the next record.
run;
I am using sas and I have a column in my table that consists of various numbers. I want to go down the column and select the number if it is smaller than the highest number so far. I posted a picture of an example of what I am looking for. I also have a column with the year, that I didn't post in the picture if that matters. I am guessing I will need some sort of loop.n is the original column and output is what I would like my loop to do.
example:
n - current column
28
22
30
40
39
55
110
89
98
160
155
157
250
output - desired output
22
39
89
98
155
157
I attempted this in proc sql because I am new to sas and know much more about sql. As I was attempting proc sql I realized I am not going to be able to do in proc sql.
Here is what I tried in proc sql.
I can post more things I have tried as I attempt more loops. As of now my loops are too far off.
proc sql;
select a.*
from homework a
full join homework b on a.make = b.make
and a.model = b.model
where a.[Initial Model Year] < b.[Initial Model Year]
and a.MPH < b.MPH;
quit;
Why always use SQL? SAS has a lot of facilities that are often more suited for the job dan SQL. Junior in SAS tend to use the only thing they know from school: SQL, and neglect all the rest.
By definition, SQL is not suited for this job! SQL does not even guarantee the order of rows is prevailed, let alone that you can use the order of the input rows in your logic. (Yes, there are SQL dialects that can do this, but not standard SQL)
Use a data step. That reads in your data row by row, in the order they occur.
Avoid writing loops explicitely whenever you can. The data step implicitely loops over it's input.
By default, the data step writes one row for each row read. You can remove a row from the output with a delete statement. You can also write explicit output statements. Then only the rows for which you do execute output wil be in the output. (output is also used if you want more than one row in the output per row in the input.)
However, by default, row by row means if forgets the previous row and all that is related to it. So you need to explicitly retain some information.
Attention, by default SAS keeps all intermediate results of calculations. If you don't want that, you need either an explicit keep statement, or a drop.
Example sollution:
data MY_SELECTION;
set MY_INPUT;
retain largest 0; * largest is initialized to 0 for the first row only *;
if largest < number then largest = number;
else if number < largest then output;
drop largest;
run;
Final remark: By default, SQL writes a report and the data step creates a new data set. If you want SQL to behave as the data step, preceed your query with create table MY_SELECTION as. If you want the data step to behave as SQL, insert proc print; before the run;
I just start learning sas and would like some help with understanding the following chunk of code. The following program computes the annual payroll by department.
proc sort data = company.usa out=work.temp;
by dept;
run;
data company.budget(keep=dept payroll);
set work.temp;
by dept;
if wagecat ='S' then yearly = wagrate *12;
else if wagecat = 'H' then yearly = wagerate *2000;
if first.dept then payroll=0;
payroll+yearly;
if last.dept;
run;
Questions:
What does out = work.temp do in the first line of this code?
I understand the data step created 2 temporary variables for each by variable (first.varibale/last.variable) and the values are either 1 or 0, but what does first.dept and last.dept exactly do here in the code?
Why do we need payroll=0 after first.dept in the second to the last line?
This code takes the data for salaries and calculates the payroll amount for each department for a year, assuming salary is the same for all 12 months and that an hourly worker works 2000 hours.
It creates a copy of the data set which is sorted and stored in the work library. RTM.
From the docs
OUT= SAS-data-set
names the output data set. If SAS-data-set does not exist, then PROC SORT creates it.
CAUTION:
Use care when you use PROC SORT without OUT=.
Without the OUT= option, PROC SORT replaces the original data set with the sorted observations when the procedure executes without errors.
Default Without OUT=, PROC SORT overwrites the original data set.
Tips With in-database sorts, the output data set cannot refer to the input table on the DBMS.
You can use data set options with OUT=.
See SAS Data Set Options: Reference
Example Sorting by the Values of Multiple Variables
First.DEPT is an indicator variable that indicates the first observation of a specific BY group. So when you encounter the first record for a department it is identified. Last.DEPT is the last record for that specific department. It means the next record would the first record for a different department.
It sets PAYROLL to 0 at the first of each record. Since you have if last.dept; that means that only the last record for each department is outputted. This code is not intuitive - it's a manual way to sum the wages for people in each department. The common way would be to use a summary procedure, such as MEANS/SUMMARY but I assume they were trying to avoid having two passes of the data. Though if you're not sorting it may be just as fast anyways.
Again, RTM here. The SAS documentation is quite thorough on these beginner topics.
Here's an alternative method that should generate the exact same results but is more intuitive IMO.
data temp;
set company.usa;
if wagecat='S' then factor=12; *salary in months;
else if wagecat='H' then factor=2000; *salary in hours;
run;
proc means data=temp noprint NWAY;
class dept;
var wagerate;
weight factor;
output out=company.budget sum(wagerate)=payroll;
run;
I am currently running a macro code in SAS and I want to do a calculation with regards to max and min. Right now the line of code I have is :
hhincscaled = 100*(hhinc - min(hhinc) )/ (max(hhinc) - min(hhinc));
hhvaluescaled = 100*(hhvalue - min(hhvalue))/ (max(hhvalue) - min(hhvalue));
What I am trying to do is re-scale household income and value variables with the calculations below. I am trying to subtract the minimum value of each variable and subtract it from the respective maximum value and then scale it by multiplying it by 100. I'm not sure if this is the right way or if SAS is recognizing the code the way I want it.
I assume you are in a Data Step. A Data Step has an implicit loop over the records in the data set. You only have access to the record of the current loop (with some exceptions).
The "SAS" way to do this is the calculate the Min and Max values and then add them to your data set.
Proc sql noprint;
create table want as
select *,
min(hhinc) as min_hhinc,
max(hhinc) as max_hhinc,
min(hhvalue) as min_hhvalue,
max(hhvalue) as max_hhvalue
from have;
quit;
data want;
set want;
hhincscaled = 100*(hhinc - min_hhinc )/ (max_hhinc - min_hhinc);
hhvaluescaled = 100*(hhvalue - min_hhvalue)/ (max_hhvalue - min_hhvalue);
/*Delete this if you want to keep the min max*/
drop min_: max_:;
run;
Another SAS way of doing this is to create the max/min table with PROC MEANS (or PROC SUMMARY or your choice of alternatives) and merge it on. Doesn't require SQL knowledge to do, and probably about the same speed.
proc means data=have;
*use a class value if you have one;
var hhinc hhvalue;
output out=minmax min= max= /autoname;
run;
data want;
if _n_=1 then set minmax; *get the min/max values- they will be retained automatically and available on every row;
set have;
*do your calculations, using the new variables hhinc_max hhinc_min etc.;
run;
If you have a class statement - ie, a grouping like 'by state' or similar - add that in proc means and then do a merge instead of a second set in want, by your class variable. It would require a sorted (initial) dataset to merge.
You also have the option of doing this in SAS-IML, which works more similarly to how you are thinking above. IML is the SAS interactive matrix language, and more similar to r or matlab than the SAS base language.
(first time posting)
I have a data set where I need to create a new variable (in SAS), based on meeting a condition related to another variable. So, the data contains three variables from a survey: Site, IDnumb (person), and Date. There can be multiple responses from different people but at the same site (see person 1 and 3 from site A).
Site IDnumb Date
a 1 6/12
b 2 3/4
c 4 5/1
a 3 .
d 5 .
I want to create a new variable called Complete, but it can't contain duplicates. So, when I go to proc freq, I want site A to be counted once, using the 6/12 Date of the Completed Survey. So basically, if a site is represented twice and contains a Date in one, I want to only count that one and ignore the duplicate site without a date.
N %
Complete 3 75%
Last Month 1 25%
My question may be around the NODUP and NODUPKEY possibilities. If I do a Proc Sort (nodupkey) by Site and Date, would that eliminate obs "a 3 ."?
Any help would be greatly appreciated. Sorry for the jumbled "table", as this is my first post (hints on making that better are also welcomed).
You can do this a number of ways.
First off, you need a complete/not complete binary variable. If you're in the datastep anyway, might as well just do it all there.
proc sort data=yourdata;
by site date descending;
run;
data yourdata_want;
set yourdata;
by site date descending;
if first.site then do;
comp = ifn(date>0,1,0);
output;
end;
run;
proc freq data=yourdata_want;
tables comp;
run;
If you used NODUPKEY, you'd first sort it by SITE DATE DESCENDING, then by SITE with NODUPKEY. That way the latest date is up top. You also could format COMP to have the text labels you list rather than just 1/0.
You can also do it with a format on DATE, so you can skip the data step (still need the sort/sort nodupkey). Format all nonmissing values of DATE to "Complete" and missing value of date to "Last Month", then include the missing option in your proc freq.
Finally, you could do the table in SQL (though getting two rows like that is a bit harder, you have to UNION two queries together).