Task: Construct a forecast with linear or quadratic deterministic trend for future number of employees in US Retail trade.
Link to the data:
https://fred.stlouisfed.org/series/USTRADE?fbclid=IwAR0-YU8nyc8iQCLOjd9DxRsU8G_GNEWEow_g2a6ob1cP0bu_xQSXO1FFPi8
Once data is downloaded. I use this code to load it.
Employees_data <- read.csv(file.choose(), header=TRUE, sep = ",")
Then I realize my date column is formatted as "factor", so I use this code:
Employees_data$DATE <- as.Date(Employees_data$DATE,"%Y-%m-%d")
Then I try to do a lm(), but realize I need 2 numeric values.
LinearModel <- lm(Employees_data$USTRADE~Employees_data$DATE)
So I convert date to numeric now using:
Employees_data$DATE <- as.numeric(Employees_data$DATE)
But what do I know, now I get negative numeric values for my date column.
data.frame': 968 obs. of 2 variables:
$ DATE : num -11323 -11292 -11264 -11233 -11203 ...
$ USTRADE: num 3108 3122 3140 3155 3152 ...
When I converted the date column to number in Excel I get these values:
Date Employees
14.246 31.071
14.277 31.211
...
Where the numbers in Date column indicate the number of days. Having this I can do the regression in Excel using Employees as Y and Date as X
Getting a lovely result:
Coefficients
Intercept -48685,09628
Date 5,079443779
How can I do this in R? I'm trying to learn R.
any help is appreciated :)
Related
When I imported my excel sheet some dates imported differently than others. I tried to fix this with the code below to format the date.
DATA volume;
SET mice.volume;
format Date MMDDYY10.;
run;
However, I received the following error.
ERROR 48-59: The format $MMDDYY was not found or could not be loaded.
I had also tried with the following code
DATA volume;
SET mice.volume;
If date= 44138 then date= '11/3';
If date= 44141 then date= '11/6';
run;
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
A Proc Contents shows the variable= Date type= Char Len=7 format=$7 Informat=$7 Label= Date
How do I fix this?
The date column being character having a mix of 'date looking' strings, and Excel date value numbers tells me some of the date values in your Excel are actually strings, such as '11/10 or ='11/10'.
The raw number 44138 is:
a SAS date value is 04-NOV-2080 (obviously not what is wanted)
an Excel date value 03-NOV-2020 (aha!)
03-NOV-2020 as SAS date value is 22222
an offset of -21916 from Excel
-21916 is the SAS date 30-DEC-1899
Date Epochs
An epoch is the date corresponding to a base number 0 in a systems calendar. SAS Base year is 1960 and Excel Base year is 1900.
Formatted
Number Actual Date Date Shown System/Format
------ ----------- ----------- ----------
0 31-DEC-1899 1/0/1900 Excel / Short Date (Formatter is weird at Epoch)
0 01-JAN-1960 01-JAN-1960 SAS / DATE11.
21916 01-JAN-1960 1/1/1960 Excel / Short Date
-21916 30-DEC-1899 31-DEC-1899 SAS / DATE11.
Notice the round trip is from 31-dec-1899 to 30-dec-1899. This due to an Excel 97 bug that has been carried forth for legacy reasons. See Microsoft's explanation in "Excel incorrectly assumes that the year 1900 is a leap year" which pushes the blame back even further to Lotus 1-2-3
The formula to convert between systems S1 and S2 date numbers is to add the # for the other systems epoch date (# # 0)
SAS#(date) = Excel#(date) + SAS#(Excel Epoch Date) - 1 (Excel leap year bug), or
sas_dt = excel_dt + '31-DEC-1899'd - 1; *or;
sas_dt = excel_dt + '30-DEC-1899'd;
What happened
Mixed value types in the Excel date column forced IMPORT to perceive the date variable as character.
The Excel cells with a date looking m/d string were brought in as the string
The Excel cells with a date, likely custom formatted as m/d, were brought in as the underlying Excel date number.
The ERROR
You tried to apply the date format MMDDYY. to the character variable Date.
A character column can not be assigned a numeric or date format, thus you get the
ERROR 48-59: The format $MMDDYY was not found or could not be loaded.
SAS automatically presumed MMDDYY. meant a character format $MMDDYY. because the variable type was character.
The Fix
You can convert the values in the character date column with code such as the following (untested):
if index(date,'/') then
date_fixed = input (trim(date)||'/2020', mmddyy10.);
else
date_fixed = input(date,best12.) + '30-DEC-1899'D;
format date_fixed yymmdd10.;
If you want to continue showing only mm/dd in SAS, use the format NLDATEM5.
format date_fixed NLDATEM5.;
Currently , I have date column in time format ,I want to change it to date time stamp format I.e ( I want the date column to look like 12nov 2020 12:03:45:00 )
Could someone help me on this ?
According to #KurtBremser:
SAS dates are counts of days, SAS datetimes are counts of seconds.
datetime = dhms(date,0,0,0);
will convert a date to a datetime. Or multiply by 86400.
A column showing a time representation hh:mm:ss can be one of three things:
A character column type containing digit characters 0-9 and :
A number column type containing a SAS time value being displayed as hh:mm:ss with the time format TIME8.
A number column type containing a SAS datetime value being displayed as hh:mm:ss with the datetime format TOD.
This sample program demonstrates how different kinds of values can all look the same when viewed.
data have;
v1 = '12:34:56';
v2 = hms(12,34,56);
v3 = dhms(today(),12,34,56);
put v1= / v2= time8. / v3=tod. / v3=datetime18.;
run;
------ LOG ------
v1=12:34:56
v2=12:34:56
v3=12:34:56
v3=25NOV20:12:34:56
Only #3 has enough information in the raw value to be formatted as ddmmmyyy:hh:mm:ss
format myDate datetime18.;
#2 requires computing a new value assuming something about the date part
* supposing myDate contains only time values (00:00:00 to 23:59:59) for today;
myNewDate = dhms(today(),0,0,0) + myDate;
format myNewDate datetime18.;
#1 requires interpretation through INPUT and a date assumption
* supposing myDate contains "hh:mm:ss" for today
myNewDate = dhms(today(),0,0,0) + input(myDate,time8.);
format myNewDate datetime18.;
I'm doing a connected twoway plot with x-axis as dates formatted as %th with values 2011h1 to 2017h2. I want to put a vertical line at 2016h2 but nothing I've tried has worked.
xline(2016h2)
xline("2016h2")
xline(date==2016h2)
xline(date=="2016h2")
I'm thinking it might be because I formatted dates with
gen date = yh(year, half)
format date %th
I think this is a MWE:
age1820 date
10.42 2011h1
10.33 2011h2
11.66 2012h1
11.01 2012h2
14.29 2013h1
10.95 2013h2
12.42 2014h1
7.04 2014h2
7.07 2015h1
6.95 2015h2
4 2016h1
8.07 2016h2
5.98 2017h1
3.19 2017h2
graph twoway connected age1820 date, xline(2016h2)
Your example will not really work as written without some additional work. I think in future posts you may want to shoot for a fully working example to maximize the chance that you get a good answer quickly. This is why I made up some fake data below.
Try something like this:
clear
set obs 20
gen date = _n + 100
format date %th
gen age = _n*2
display %th 116
display %th 117
tw connected age date, xline(116 `=th(2018h2)') tline(2019h1)
The crux of the matter is that Stata deals with dates as integers that have a special label attached to them by the format command (but not a value label). For example, 0 corresponds to 1960h1. In other words, you need to either:
tell xline() the number that corresponds to the date you want
use th() to figure out what that number is and force the evaluation inside xline().
use tline(), which is smart enough to understand dates.
I think the third is the best option.
I Have a file from excel that is in a short date format, but when SAS reads it in, it turns it into numbers in the 4000 range...when I try and convert this to an excel date with the below formula, it turns the year into 2077...is there a formula to ensure that this date remains in the original format on the read in, or avoid it turning into this 4000 range that is not at all close to the 2017 and 2018 year that my file is starting in. Does that make sense?
data change_date;
format Completed_Date mmddyy8. ;
set check;
completed_date = date_completed;
if 42005 => date_completed >=43466 and date_completed ^=. then
Completed_date = Date_Completed-21916; *commented out 12-21-17 Xalka
dates back to how they are expected;
run;
I am pretty sure this is a duplicate question, but I can't find it.
This is usually caused by mixing character and date values in the same column. This made SAS import the data as a character variable and it results in the actual dates being copied as character versions of the integers that Excel uses to store dates.
Frequently this is caused by entries that look like dates but are really character strings in the Excel file. The best way to fix it is to fix the Excel file so that the column only contains dates. Otherwise you just need to convert the strings to integers and adjust the values to account for the differences in index dates.
So if your values are in a SAS dataset named HAVE in the character variable DATESTRING then you could use this data step to create a new variable with an actual date value.
data want ;
set have ;
if indexc(datestring,'-/') then date=input(datestring,anydtdte32.);
else date = input(datestring,32.) + '01JAN1900'D -2;
format date yymmdd10. ;
run;
The minus 2 is because of difference in whether to start numbering with 1 or 0 and because Excel thinks 1900 was a leap year.
Excel and SAS have different default dates in back-end.
Day 0 in SAS is 1 January 1960 and Day 0 in Excel is 1 January 1900.
So, you will need to convert excel numeric date to sas date using the below formula.
SAS_date = Excel_date - 21916;
data dateExample;
informat dt mmddyy8.;
set dates;
SAS_date = dates - 21916;
dt=sas_Date;
format dt date9.;
run;
I am currently trying to change the format of a date from "2010-01-11 00:00:00" to "01-11-2010" or "1/11/2010". Currently "2010-01-11 00:00:00" is in a string format. I have tried to coerce using the date() function but it never returns to the point where Stata can recognize and sort. Would anyone have any idea how to do this?
It's best if for future questions you post attempted code and why it's not working for you.
Maybe this works in your case:
clear all
set more off
*----- example data -----
set obs 1
gen dat = "2010-01-11 00:00:00"
describe
list
*----- what you want -----
gen double dat2 = clock(dat, "YDM hms")
format dat2 %tcDD-NN-YY
describe
list
Note that we go from string type to numeric type (double), and then adjust the display format.
See help format, help datetime and help datetime_display_formats.
Read also:
Stata tip 113: Changing a variable's format: What it does and does not mean
N. J. Cox. 2012.
Stata Journal Volume 12 Number 4.
http://www.stata-journal.com/article.html?article=dm0067
If you are ingesting time data in "2010-01-11 00:00:00" (SQL) format, then by default it is ingested into Stata as a str23
If you would like it as a Stata date format to manipulate, you could try the following (ingested_date_1 ... being your date columns)
foreach sqltime in ingested_date_1 ingested_date_2 {
rename `sqltime' X
generate double `sqltime' = clock(X, "YMD hms")
drop X
format %tcDDmonCCYY_HH:MM:SS `sqltime'
}
This, takes in multiple "dates", just replace your column names with ingested_date_1 ingested_date_2 etc and reformats them and keeps their 'original' name
Now the dates are in a stata recognised time format, %tc based of the clock, this will be sorted in the time-sense like you expect, rather than the ingested string which was not.
Additionally you may now reformat the display of the date to something that you would like or are comfortable reading, although it will make no difference to date manipulation, it is just the displayed appearance, in the case of viewing as "01-11-2010"
as Roberto says
format ingetsed_date_i %tcDD-NN-YY