Data Locations;
input coordinates $;
datalines;
35° 47' 29.5368' N and 78° 46' 52.0320' W.;
run;
How do I write the coordinates to where they are placed as one dataline?
I have tried double quotes, parentheses getting rid of all inner quotes. Maybe I should put something other then an input and dollar sign?
There are a few different ways you could do it with a data step. Notice I've set the Variable Length to 45 in all the examples. These examples were tested in Windows SAS 9.4 only.
Data Locations;
input coordinates $ 1-45;
datalines;
35° 47' 29.5368' N and 78° 46' 52.0320' W.
;
run;
or
Data Locations;
input coordinates $45.;
datalines;
35° 47' 29.5368' N and 78° 46' 52.0320' W.
;
run;
SAS online docs (Link) have some pretty good examples of using the datalines statement.
Alternatively you could do this with proc sql, as below.
proc sql;
create table Locations
(coordinates char(45));
insert into Locations
values("35° 47' 29.5368' N and 78° 46' 52.0320' W.");
quit;
Some good examples on creating a table and inserting data into using the sql procedure can be found here.
Related
I am working on a SAS Dataset which has missing values.
I can identify whether a particular variable has missing values using IS NULL/IS MISSING operator.
Is there any alternative way, through which I can identify which variables have missing values in one shot.
Thanks in Advance
The syntax IS NULL or IS MISSING is limited to use in SQL code (also in WHERE statements or WHERE= dataset options since those essentially use the same parser.)
To test if a value is missing you can also use the MISSING() function. Or compare it to a missing value. So for character variables test if it is equal to all blanks: c=' '. For numeric you can test x=., but you also need to look out for special missing values. So you might test if x <= .z.
To get a quick summary of number of distinct missing values for each variable you could use the NLEVEL option on PROC FREQ. Note it might not work for a large dataset with too many distinct values as the procedure will run out of memory.
use array and vname to find variable with missing values. If you want rows with missing values use cmiss function.
data have;
infile datalines missover;
input id num char $ var $;
datalines;
1 . A C
2 3 D
5 6 B D
;
/* gives variables with missing values*/
data want1(keep=miss);
set have;
array chars(*) _character_;
array nums(*) _numeric_;
do i=1 to dim(chars);
if chars(i)=' ' then
miss=vname(chars(i));
if nums(i)=. then
miss=vname(nums(i));
end;
if miss=' ' then
delete;
run;
/* gives rows with missing value*/
data want(drop=rows);
set have;
rows=cmiss(of id -- var);
if rows=1;
run;
You can use proc freq table statement with missing option. It includes missing category if missing values exist. Useful for categorical data.
data example;
input A Freq;
datalines;
1 2
2 2
. 2
;
*list variables in tables statement;
proc freq data=example;
tables A / missing;
run;
You can also use Proc Univariate it creates MissingValues table in ODS by default if any missing values exist. Useful for numeric data.
Two options (in addition to Peter Slezák's) I can suggest are :
- Use proc means with nmiss
proc means data = ___ n nmiss;
var _numeric_;
run;
In SAS Enterprise Guide, there is a characterize data task - this helps profile character variables too. (Under the hood, it is a combination of various procs, but is an easy to use option).
Hope this helps,
regards,
Sundaresh
I am trying to create a prediction interval in SAS. My SAS code is
Data M;
input y x;
datalines;
100 20
120 40
125 32
..
;
proc reg;
model y = x / clb clm alpha =0.05;
Output out=want p=Ypredicted;
run;
data want;
set want;
y1= Ypredicted;
proc reg data= want;
model y1 = x / clm cli;
run;
but when I run the code I could find the new Y1 how can I predict the new Y?
What you're trying to do is score your model, which takes the results from the regression and uses them to estimate new values.
The most common way to do this in SAS is simply to use PROC SCORE. This allows you to take the output of PROC REG and apply it to your data.
To use PROC SCORE, you need the OUTEST= option (think 'output estimates') on your PROC REG statement. The dataset that you assign there will be the input to PROC SCORE, along with the new data you want to score.
As Reeza notes in comments, this is covered, along with a bunch of other ways to do this that might work better for you, in Rick Wicklin's blog post, Scoring a regression model in SAS.
I have a sas dataset that has a list of variables embedded within a single character variable, delimited by pipes. It looks something like this:
Obs. List_of_forms
1,"|FormA(04-15-2003)||FormB(04-15-2004)|",
2,"|FormA(04-15-2002)||FormA(04-15-2003)||FormB(04-15-2003)|"
I would like to extract each of the items delimited by pipes as individual variables, so the data would look something like this:
Obs., form1, form2, form3
1,"FormA(04-15-2003)","FormB(04-15-2004)",.,
2,"FormA(04-15-2002)","FormA(04-15-2003)","FormB(04-15-2003)"
But I'm at a loss for how to do this. I've thought about coding a do-loop to iterate through each pipe, but this seems needlessly complex. Any advice for a more elegant solution?
Use the SCAN() function. First we can setup your example data.
data have ;
obs+1;
input list_of_forms $60. ;
cards;
|FormA(04-15-2003)||FormB(04-15-2004)|
|FormA(04-15-2002)||FormA(04-15-2003)||FormB(04-15-2003)|
;;;;
Now we can convert it to multiple columns.
data want;
set have ;
array form (3) $60 ;
do i=1 to dim(form);
form(i) = scan(list_of_forms,i,'|');
end;
drop i;
run;
To make it more dynamic you could find the maximum number of values over the whole dataset and replace the hard coded upper bound of 3 on the new variables.
proc sql noprint ;
select max(countw(list_of_forms,'|'))
into :nforms
from have
;
run;
...
array form (&nforms) $60 ;
*the title may be misleading
I have (column) cells values as follows:
d="M200,170L149,385"
d="M200,170L150,387"
d="M200,170L275,384"
d="M200,170L49,317"
d="M200,170L92,347"
The values 200 & 170 in each cell represent the x and y origins respectively, while the second set of values (i.e. 149 and 385) represent the x and y values.
I want to separate the x-orgin, y-orgin, x and y values into four columns. (I'm relatively new to sas... I think these are cartesian coordinates)
How would I go about doing this?
Use the scan function. It is used to select the nth word of a string. First argument is the string you want parsed, second is the word (1st, 2nd, etc), and third lists your delimiters (characters that separate the words). That should be all you need.
data want;
set have;
origx = scan(d,1,'M,L');
origy = scan(d,2,'M,L');
x = scan(d,3,'M,L');
y = scan(d,4,'M,L');
run;
Do you have a SAS dataset with a variable named d in it, or do you have a text file? My first read was that you have a SAS dataset already, in which case you need to parse the variable. You could use SCAN() function, or plenty of other methods, e.g.:
data have;
input d $16.;
cards;
M200,170L149,385
M200,170L150,387
M200,170L275,384
M200,170L49,317
M200,170L92,347
;
run;
data want;
set have;
x_origin=scan(d,1,"M,L");
y_origin=scan(d,2,"M,L");
x=scan(d,3,"M,L");
y=scan(d,4,"M,L");
run;
proc print data=want;
run;
I have a point data set containing latitude, longitude and elevation data. I would like to identify the nearest neighbour of a given point by using the distance between any two given points (2d or 3d). Could anybody suggest the different methods available in SAS for such geo-spatial data analysis and an example SAS code? Thanks.
Your best bet is to look into the clustering procedures, as KNN style clustering is pretty close to what you want (and at minimum cluster analysis can get you to a 'set' of neighbors to check). PROC MODECLUS, PROC FASTCLUS, PROC CLUSTER all give you some value here, as does PROC DISTANCE which is used as input in some cases to the above. Exactly what you want to use depends on what you need and your speed/size constraints (PROC CLUSTER is very slow with large datasets, but gives more useful results oftentimes).
Here is an example of nearest-neighbour calculation via the use of SQL (given in the SAS help file somewhere):
options ls=80 ps=60 nodate pageno=1 ;
data stores;
input Store $ x y;
datalines;
store1 5 1
store2 5 3
store3 3 5
store4 7 5
;
data houses;
input House $ x y;
datalines;
house1 1 1
house2 3 3
house3 2 3
house4 7 7
;
options nodate pageno=1 linesize=80 pagesize=60;
proc sql;
title 'Each House and the Closest Store';
select house, store label='Closest Store',
sqrt((abs(s.x-h.x)**2)+(abs(h.y-s.y)**2)) as dist
label='Distance' format=4.2
from stores s, houses h
group by house
having dist=min(dist);
quit;
I wrote 2 macros to accomplish this!
first macro to take one input gps location and use the lat and lon as a set value new variables for the "neighbor" location dataset. compute all distance and select the min value and store in temp dataset.
second calling macro loop through the input datset and to pass in the indivial gps location, call the first macro to do the work and append each min distance to my output dataset.
/*** first concatenate your input lat, lon as well as some id into a | seperate long string for later %scan into individual input ***/
%macro min_distance;
data compute_all_dis;
set all_neighbor_gps_locations;
/** here create a new variable to this big dataset with the one point gps value***/
first_lat = &latitude;
first_lon = &longitude;
ID = &PERIOD;
/** compute all **/
distance = geodist(lat, long, first_lat, first_lon, 'dm');
run;
/** get the shorted distance***/
proc sql;
create table closest_neighbor as
select milepost,OFF_PERIOD_ID, lat, long, first_lat, first_lon, distance
from compute_all_dis
having distance = min ( distance);
quit;
%mend min_distance;
%macro find_all_closest_neighbors;
data _null_;
runno=countw("&ID",'|');
call symputx('runno',put(runno,8.));
run;
%put &runno;
%do i=1 %to &runno;
%let PERIOD = %SCAN(&OFF_ID, &i, "|");
%let latitude = %SCAN (&LAT_I, &i, "|");
%let longitude = %SCAN (&LONG_I, &i, "|");
%min_distance;
proc datasets nowarn;
append base= pout.all_close_neighbors data=closest_neighbor;
run;
%end;
%mend find_all_closest_neighbors;
%find_all_closest_neighbors;