I'm trying to create a running total for the Sales by year. I inputted the data and created a new variable Year. I tried using retain function but do not know how to reset it to 0 when there is a new year.
data WeeklySales5;
infile datalines dlm=',' DSD firstobs=2;
input Date :MMDDYY10. Sales :Dollar8.;
format Date :MMDDYY10.;
year = year(Date);
retain TotalSales;
TotalSales = sum(TotalSales,Sales);
datalines;
Date, Sales
1/5/2010,"$7,580.42 "
1/12/2010,"$7,753.55 "
10/22/2013,"$9,545.17 "
10/29/2013,"$9,323.54 "
5/12/2015,"$8,678.97 "
5/19/2015,"$8,601.38 "
;
run;
Do something like this
data WeeklySales5;
infile datalines dlm=',' DSD firstobs=2;
input Date :MMDDYY10. Sales :Dollar8.;
year = year(Date);
format Date :MMDDYY10.;
datalines;
Date, Sales
1/5/2010,"$7,580.42 "
1/12/2010,"$7,753.55 "
10/22/2013,"$9,545.17 "
10/29/2013,"$9,323.54 "
5/12/2015,"$8,678.97 "
5/19/2015,"$8,601.38 "
;
run;
data want;
set WeeklySales5;
by year;
if first.year then TotalSales = 0;
TotalSales = sum(TotalSales,Sales);
retain TotalSales;
run;
Related
I would like to add a day to a SAS date and save that value as a new variable. I have these data:
data have;
input ID $ date;
datalines;
A 14610
B 13229
C 15644
D 14278
;
run;
And I would to end up with these data, with the new variables labelled for each iteration as below:
data want;
input ID $ date date1 date2 date3;
datalines;
A 14610 14611 14612 14613
B 13229 13230 13231 13232
C 15644 15645 15646 15647
D 14278 14279 14280 14281
;
run;
How can I accomplish this?
Try this
data have;
input ID $ date;
datalines;
A 14610
B 13229
C 15644
D 14278
;
run;
data want;
set have;
array d date1 - date3;
do over d;
d = date + _i_;
end;
run;
I have this csv dataset named Movie:
ID,Underage,Name,Rating,Year, Rank on IMDb ,
M1021,,Elanor, Melanor,12,1879,5
M1203,Yes,IT,12,1999,1,
M0081,,Cars 2,13,1999,2,
M1371,No,Kiminonawa,12,2017,3,
M3416,,Living in the past, fading future,13,2018,12
I would like to import Movie into SAS such that "Elanor, Melanor" is the Name instead of 'Elanor' being under Name while 'Melanor' being in Rating.
I tried the follow code:
FILENAME XX '....Movie.csv';
data movieYY (drop=DLM1at field2);
infile XX dlm=',' firstobs=2 dsd;
format ID $5. Underage $3. Name $50. Year 4. Rating $3. 'Rank on IMDb'n 2.;
input #;
DLM1at = find(_INFILE_, ',');
length field2 $4;
field2 = substr(_INFILE_, DLM1at + 1, 4);
if lengthn(compress(field2, '1234567890')) ne 0 then do;
_INFILE_ = substr(_INFILE_, 1, dlm1at - 1) || ' ' ||
substr(_INFILE_, dlm1at + 1);
end;
input ID Underage Name Year Rating 'Rank on IMDb'n;
run;
May I know what should i do? I am still a beginner in SAS. Thank you!
Add quotes to the name of each movie, or use another delimiter. Any data within a delimited file that also has the same delimiter must be in quotes. For example:
data foo;
infile datalines dlm="," dsd;
length id 8. name $25.;
input id name$;
datalines;
1, "Smith, John"
2, "Cage, Nicolas"
;
run;
I'm looking to take a variable observation's date and essentially keep rolling it forward by its specified repricing parameter until a target date
the dataset being used is:
data have;
input repricing_frequency date_of_last_repricing end_date;
datalines;
3 15399 21367
10 12265 21367
15 13879 21367
;
format date_of_last_repricing end_date date9.;
informat date_of_last_repricing end_date date9.;
run;
so the idea is that i'd keep applying the repricing frequency of either 3 months, 10 months or 15 months to the date_of_last_repricing until it is as close as it can be to the date "31DEC2017". Thanks in advance.
EDIT including my recent workings:
data want;
set have;
repricing_N = intck('Month',date_of_last_repricing,'31DEC2017'd,'continuous');
dateoflastrepricing = intnx('Month',date_of_last_repricing,repricing_N,'E');
format dateoflastrepricing date9.;
informat dateoflastrepricing date9.;
run;
The INTNX function will compute an incremented date value, and allows the resultant interval alignment to be specified (in your case the 'end' of the month n-months hence)
data have;
format date_of_last_repricing end_date date9.;
informat date_of_last_repricing end_date date9.;
* use 12. to read the raw date values in the datalines;
input repricing_frequency date_of_last_repricing: 12. end_date: 12.;
datalines;
3 15399 21367
10 12265 21367
15 13879 21367
;
run;
data want;
set have;
status = 'Original';
output;
* increment and iterate;
date_of_last_repricing = intnx('month',
date_of_last_repricing, repricing_frequency, 'end'
);
do while (date_of_last_repricing <= end_date);
status = 'Computed';
output;
date_of_last_repricing = intnx('month',
date_of_last_repricing, repricing_frequency, 'end'
);
end;
run;
If you want to compute only the nearest end date, as when iterating by repricing frequency, you do not have to iterate. You can divide the months apart by the frequency to get the number of iterations that would have occurred.
data want2;
set have;
nearest_end_month = intnx('month', end_date, 0, 'end');
if nearest_end_month > end_date then nearest_end_month = intnx('month', nearest_end_month, -1, 'end');
months_apart = intck('month', date_of_last_repricing, nearest_end_month);
iterations_apart = floor(months_apart / repricing_frequency);
iteration_months = iterations_apart * repricing_frequency;
nearest_end_date = intnx('month', date_of_last_repricing, iteration_months, 'end');
format nearest: date9.;
run;
proc sql;
select id, max(date_of_last_repricing) as nearest_end_date format=date9. from want group by id;
select id, nearest_end_date from want2;
quit;
I've the following code. Though I've entered 30jun1983 it gets saved as 30/jun/2020. And it is reading only when there's two spaces between the date values in the cards and if there's only one space it reads the second value as missing.
DATA DIFFERENCE;
infile cards dlm=',' dsd;
INPUT DATE1 DATE9. Dt2 DATE9.;
FORMAT DATE1 DDMMYY10. Dt2 DDMMYY10.;
DIFFERENCE=YRDIF(DATE1,Dt2,'ACT/ACT');
DIFFERENCE=ROUND(DIFFERENCE);
CARDS;
11MAY2009 30jun1983
;
RUN;
You need colons on your input statement (to denote INformats), and also a comma in your datalines (you specified a comma as your DLM - delimiter):
DATA DIFFERENCE;
infile cards dlm=',' dsd;
INPUT DATE1 :DATE9. Dt2 :DATE9.;
FORMAT DATE1 DDMMYY10. Dt2 DDMMYY10.;
DIFFERENCE=YRDIF(DATE1,Dt2,'ACT/ACT');
DIFFERENCE=ROUND(DIFFERENCE);
CARDS;
11MAY2009,30jun1983
;
RUN;
I have a script in sas that output a lot's of tables of regression:
FILENAME RegProj URL "http://www.math.tau.ac.il/~liadshek/Long.txt" ;
DATA book;
length country $20;
INFILE RegProj firstobs=2 dlm=" " LRECL=131072 dsd truncover;
INPUT Country$ Year GDP_per_capita Infant_Mortality_Rate;
log_IMR = log(infant_mortality_rate);
log_gdp = log(GDP_per_capita);
RUN;
PROC reg data=book;
by year;
MODEL log_IMR = log_gdp;
output out = reg1;
run;
How can i print all the coefficient in one table?
I think you're looking for the outest option - try the following:
PROC reg data=book outest = reg_estimates;
by year;
MODEL log_IMR = log_gdp;
output out = reg1;
run;
This gives you all the regression coefficients in one table.