I have about 100,000 latitude and longitude pairs (to 4 decimals) and I would like to assign each pair to a US state. Does anyone know how to do this in SAS? Is it possible to import shapefiles into SAS for this task?
Note: the following answer assumes you have a license for SAS Graph and the Maps library is properly set up on your installation. This will not work on SAS University Edition. If you need to download the MAPS files they are here: http://support.sas.com/rnd/datavisualization/mapsonline/index.html
You don't need to import the SHP file for the US, SAS already has these built in. You can use PROC GINSIDE to determine what state and/or county the points are located.
An example is located here:
https://support.sas.com/documentation/cdl/en/grmapref/69722/HTML/default/viewer.htm#p0qjcc8hugcjb2n1x3bmuaar16f0.htm
And copied here, for SO rules.
goptions reset=global border;
data gpscounties;
input longitude latitude site $;
x=longitude*arcos(-1)/180;
x=x*(-1);
y=latitude*arcos(-1)/180;
datalines;
-77.0348 40.0454 a
-78.4437 39.1623 b
-78.4115 39.3751 c
-78.7646 40.6354 d
;
run;
proc ginside data=gpscounties map=mapssas.counties out=gpscounties;
id state county;
run;
proc sort data=gpscounties;
by site;
run;
proc print data=gpscounties;
var site state county x y;
run;
quit;
Related
I have been working on a SAS problem on the University Edition where it is given that:
Separate out the data only for passenger vehicle launched after 1-October-2014;
data passenger;
set avik1.clean;
informat Latest_Launch ddmmyy10.;
if Vehicle_type = "Passenger" and Latest_Launch > "01-10-2014";
run;
proc print data=passenger;
run;
I am able to separate only the passenger vehicles however my date has no effect as it doesn't separate out the dates after 01/10/2014.
I ran the Proc Contents Command just in case you would like to have a look on my data attributes
Proc Contents Print Output
I am new to SAS and I am facing some issues whenever there is a date problem.
In SAS date constants are written 'DDMONYYYY'D date9 format followed by D.
for you '01OCT2014'd
I am trying to create a prediction interval in SAS. My SAS code is
Data M;
input y x;
datalines;
100 20
120 40
125 32
..
;
proc reg;
model y = x / clb clm alpha =0.05;
Output out=want p=Ypredicted;
run;
data want;
set want;
y1= Ypredicted;
proc reg data= want;
model y1 = x / clm cli;
run;
but when I run the code I could find the new Y1 how can I predict the new Y?
What you're trying to do is score your model, which takes the results from the regression and uses them to estimate new values.
The most common way to do this in SAS is simply to use PROC SCORE. This allows you to take the output of PROC REG and apply it to your data.
To use PROC SCORE, you need the OUTEST= option (think 'output estimates') on your PROC REG statement. The dataset that you assign there will be the input to PROC SCORE, along with the new data you want to score.
As Reeza notes in comments, this is covered, along with a bunch of other ways to do this that might work better for you, in Rick Wicklin's blog post, Scoring a regression model in SAS.
Data Locations;
input coordinates $;
datalines;
35° 47' 29.5368' N and 78° 46' 52.0320' W.;
run;
How do I write the coordinates to where they are placed as one dataline?
I have tried double quotes, parentheses getting rid of all inner quotes. Maybe I should put something other then an input and dollar sign?
There are a few different ways you could do it with a data step. Notice I've set the Variable Length to 45 in all the examples. These examples were tested in Windows SAS 9.4 only.
Data Locations;
input coordinates $ 1-45;
datalines;
35° 47' 29.5368' N and 78° 46' 52.0320' W.
;
run;
or
Data Locations;
input coordinates $45.;
datalines;
35° 47' 29.5368' N and 78° 46' 52.0320' W.
;
run;
SAS online docs (Link) have some pretty good examples of using the datalines statement.
Alternatively you could do this with proc sql, as below.
proc sql;
create table Locations
(coordinates char(45));
insert into Locations
values("35° 47' 29.5368' N and 78° 46' 52.0320' W.");
quit;
Some good examples on creating a table and inserting data into using the sql procedure can be found here.
Is it possible to score a data set with a model created by PROC ARIMA in SAS?
This is the code I have that is not working:
proc arima data=work.data;
identify var=x crosscorr=(y(7) y(30));
estimate outest=work.arima;
run;
proc score data=work.data score=work.arima type=parms predict out=pred;
var x;
run;
When I run this code I get an error from the PROC SCORE portion that says "ERROR: Variable x not found." The x column is in the data set work.data.
proc score does not support autocorrelated variables. The simplest way to get an out-of-sample score is to combine both proc arima and a data step. Here's an example using sashelp.air.
Step 1: Generate historical data
We leave out the year 1960 as our score dataset.
data have;
set sashelp.air;
where year(date) < 1960;
run;
Step 2: Generate a model and forecast
The nooutall option tells proc arima to only produce the 12 future forecasts.
proc arima data=have;
identify var=air(12);
estimate p=1 q=(2) method=ml;
forecast lead=12 id=date interval=month out=forecast nooutall;
run;
Step 3: Score
Merge together your forecast and full historical dataset to see how well the model did. I personally like the update statement because it will not replace anything with missing values.
data want;
update forecast(in=fcst)
sashelp.air(in=historical);
by Date;
/* Generate fit statistics */
Error = Forecast-Air;
PctError = Error/Air;
AbsPctError = abs(PctError);
/* Helpful for bookkeeping */
if(fcst) then Type = 'Score';
else if(historical) then Type = 'Est';
format PctError AbsPctError percent8.2;
run;
You can take this code and convert it into a generalized macro for yourself. That way in the future, if you wanted to score something, you could simply call a macro program to get what you need.
I have the following sample data and 'proc means' command.
data have;
input measure country $;
datalines;
250 UK
800 Ireland
500 Finland
250 Slovakia
3888 Slovenia
34 Portugal
44 Netherlands
4666 Austria
run;
PROC PRINT data=have; RUN;
The following PROC MEANS command prints out a listing for each country above. How can I group some of those countries (i.e. UK & Ireland, Slovakia/SLovenia as Central Europe) in the PROC MEANS step, rather than adding another datastep to add a 'case when' etc?
proc means data=have sum maxdec=2 order=freq STACKODS;
var measure;
class country;
run;
Thanks for any help at all on this. I understand there are various things you can do in the PROC MEANS command itself (like limit the number of countries by doing this:
proc means data=have(WHERE=(country not in ('Finland', 'UK')
I'd like to do the grouping in the PROC MEANS command for brevity.
Thanks.
This is very easy with a format for any PROC that takes a CLASS statement.
Simply build a format, either with code or from data; then apply the format in the PROC MEANS statement.
proc format lib=work;
value $countrygroup
"UK"="British Isles"
"Ireland"="British Isles"
"Slovakia","Slovenia"="Central Europe"
;
quit;
proc means data=have;
class country;
var measure;
format country $countrygroup.;
run;
It's usually better to have numeric codes for country and then format those to be whichever set of names is needed at any one time, particularly as capitalization/etc. is pretty irritating, but this works well enough even here.
The CNTLIN= option in PROC FORMAT allows you to make a format from a dataset, with FMTNAME as the value statement, START as the value-to-label, LABEL as the label. (END=end of range if numeric.) There are other options also, the documentation goes into more detail.