*the title may be misleading
I have (column) cells values as follows:
d="M200,170L149,385"
d="M200,170L150,387"
d="M200,170L275,384"
d="M200,170L49,317"
d="M200,170L92,347"
The values 200 & 170 in each cell represent the x and y origins respectively, while the second set of values (i.e. 149 and 385) represent the x and y values.
I want to separate the x-orgin, y-orgin, x and y values into four columns. (I'm relatively new to sas... I think these are cartesian coordinates)
How would I go about doing this?
Use the scan function. It is used to select the nth word of a string. First argument is the string you want parsed, second is the word (1st, 2nd, etc), and third lists your delimiters (characters that separate the words). That should be all you need.
data want;
set have;
origx = scan(d,1,'M,L');
origy = scan(d,2,'M,L');
x = scan(d,3,'M,L');
y = scan(d,4,'M,L');
run;
Do you have a SAS dataset with a variable named d in it, or do you have a text file? My first read was that you have a SAS dataset already, in which case you need to parse the variable. You could use SCAN() function, or plenty of other methods, e.g.:
data have;
input d $16.;
cards;
M200,170L149,385
M200,170L150,387
M200,170L275,384
M200,170L49,317
M200,170L92,347
;
run;
data want;
set have;
x_origin=scan(d,1,"M,L");
y_origin=scan(d,2,"M,L");
x=scan(d,3,"M,L");
y=scan(d,4,"M,L");
run;
proc print data=want;
run;
Related
Suppose I have the following dataset:
Name Option
---- ------
A X
A
B
C X
B
E X
C
I want to delete all lines in which in column "Name" there is a letter that in column Option as an X
In the previous example for instance I would like to delete all lines in which in Name there is A, C and E.
How could I do?
I am a beginner in Sas
Use delete.
data want;
set have;
if(option = 'X') then delete;
run;
An important note about delete: no other code will run after this statement. If you have code after this conditional then it will not execute. This is a unique feature of delete.
You can optionally use remove instead, in which case additional code will run after the statement.
Since you are a beginning let's explain some basic terminology. A DATESET consists of OBSERVATIONS (what you might call a row or a line) and VARIABLES (what you might call a column).
If you want to select observations that contain particular values then you probably want to use the IN operator.
data want;
set have;
where name not in ('A','B','C');
run;
If you want to select observations where the value of the variable NAME contains a particular letter then you probably want to use INDEXC() function.
data want;
set have;
where not indexc(name,'ABC');
run;
If you do not care about the case of the letters then you could convert the values to uppercase and test. Or switch to the FINDC() function instead, which has more options, including one to ignore the case when checking for letter matches.
data want;
set have;
where not findc(name,'ABC','i');
run;
Here is a SQL solution if you want to delete ALL rows corresponding to a name that has any row with OPTION='X'
data have;
infile datalines missover;
input name $ option $;
datalines;
A X
A
B
C X
B
E X
C
;
proc sql;
create table remove as
select distinct(name) from have
where option = 'X'
;
create table want as
select * from have
where name not in (select name from remove)
;
quit;
The following code is an old-school SAS technique of SORT and MERGE and produces the same result.
proc sort data=have;
by name;
data filter;
set have;
by name;
where option='X';
if first.name;
data want;
merge have filter(in=residue);
by name;
if not residue;
run;
I have the following data:
data df;
input id $ d1 d2 d3;
datalines;
a . 2 3
b . . .
c 1 . 3
d . . .
;
run;
I want to apply some transformation/operation across a subset of columns. In this case, that means dropping all rows where columns prefixed with d are all missing/null.
Here's one way I accomplished this, taking heavy influence from this SO post.
First, sum all numeric columns, row-wise.
data df_total;
set df;
total = sum(of _numeric_);
run;
Next, drop all rows where total is missing/null.
data df_final;
set df_total;
where total is not missing;
run;
Which gives me the output I wanted:
a . 2 3
c 1 . 3
My issue, however, is that this approach assumes that there's only one "primary-key" column (id, in this case) and everything else is numeric and should be considered as a part of this sum(of _numeric_) is not missing logic.
In reality, I have a diverse array of other columns in the original dataset, df, and it's not feasible to simply drop all of them, writing all of that out. I know the columns for which I want to run this "test" all are prefixed with d (and more specifically, match the pattern d<mm><dd>).
How can I extend this approach to a particular subset of columns?
Use a different short cut reference, since you know it all starts with D,
total = sum( of D:);
if n(of D:) = 0 then delete;
Which will add variables that are numeric and start with D. If you have variables you want to exclude that start with D, that's problematic.
Since it's numeric, you can also use the N() function instead, which counts the non missing values in the row. In general though, SAS will do this automatically for most PROCS such as REG/GLM(not in a data step obviously).
If that doesn't work for some reason you can query the list of variables from the sashelp table.
proc sql noprint;
select name into :var_list separated by ", " from sashelp.vcolumn
where libname='WORK' and memname='DF' and name like 'D%';
quit;
data df;
set have;
if n(&var_list.)=0 then delete;
run;
Data Locations;
input coordinates $;
datalines;
35° 47' 29.5368' N and 78° 46' 52.0320' W.;
run;
How do I write the coordinates to where they are placed as one dataline?
I have tried double quotes, parentheses getting rid of all inner quotes. Maybe I should put something other then an input and dollar sign?
There are a few different ways you could do it with a data step. Notice I've set the Variable Length to 45 in all the examples. These examples were tested in Windows SAS 9.4 only.
Data Locations;
input coordinates $ 1-45;
datalines;
35° 47' 29.5368' N and 78° 46' 52.0320' W.
;
run;
or
Data Locations;
input coordinates $45.;
datalines;
35° 47' 29.5368' N and 78° 46' 52.0320' W.
;
run;
SAS online docs (Link) have some pretty good examples of using the datalines statement.
Alternatively you could do this with proc sql, as below.
proc sql;
create table Locations
(coordinates char(45));
insert into Locations
values("35° 47' 29.5368' N and 78° 46' 52.0320' W.");
quit;
Some good examples on creating a table and inserting data into using the sql procedure can be found here.
I have a SAS dataset that I have created by reading in a .txt file. It has about 20-25 rows and I'd like to add a new column that assigns an alphabet in serial order to each row.
Row 1 A
Row 2 B
Row 3 C
.......
It sounds like a really basic question and one that should have an easy solution, but unfortunately, I'm unable to find this anywhere. I get solutions for adding new calculated columns and so on, but in my case, I just want to add a new column to my existing datatable - there is no other relation between the variables.
This is kind of ugly and if you have more than 26 rows it will start to use random ascii characters. But it does solve the problem as defined by the question.
Test data:
data have;
do row = 1 to 26;
output;
end;
run;
Explanation:
On my computer, the letter 'A' is at position 65 in the ASCII table (YMMV). We can determine this by using this code:
data _null_;
pos = rank('A');
put pos=;
run;
The ASCII table will position the alphabet sequentially, so that B will be at position 66 (if A is at 65 and so on).
The byte() function returns a character from the ASCII table at a certain position. We can take advantage of this by using the position of ASCII character A as an offset, subtracting 1, then adding the row number (_n_) to it.
Final Solution:
data want;
set have;
alphabet = byte(rank('A')-1 + _n_);
run;
Not better than Tom's but a brute force alternative essentially. Create the string of Alpha and then use CHAR() to identify character of interest.
data want;
set sashelp.class;
retain string 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
letter = char(string, _n_);
run;
I have a sas dataset that has a list of variables embedded within a single character variable, delimited by pipes. It looks something like this:
Obs. List_of_forms
1,"|FormA(04-15-2003)||FormB(04-15-2004)|",
2,"|FormA(04-15-2002)||FormA(04-15-2003)||FormB(04-15-2003)|"
I would like to extract each of the items delimited by pipes as individual variables, so the data would look something like this:
Obs., form1, form2, form3
1,"FormA(04-15-2003)","FormB(04-15-2004)",.,
2,"FormA(04-15-2002)","FormA(04-15-2003)","FormB(04-15-2003)"
But I'm at a loss for how to do this. I've thought about coding a do-loop to iterate through each pipe, but this seems needlessly complex. Any advice for a more elegant solution?
Use the SCAN() function. First we can setup your example data.
data have ;
obs+1;
input list_of_forms $60. ;
cards;
|FormA(04-15-2003)||FormB(04-15-2004)|
|FormA(04-15-2002)||FormA(04-15-2003)||FormB(04-15-2003)|
;;;;
Now we can convert it to multiple columns.
data want;
set have ;
array form (3) $60 ;
do i=1 to dim(form);
form(i) = scan(list_of_forms,i,'|');
end;
drop i;
run;
To make it more dynamic you could find the maximum number of values over the whole dataset and replace the hard coded upper bound of 3 on the new variables.
proc sql noprint ;
select max(countw(list_of_forms,'|'))
into :nforms
from have
;
run;
...
array form (&nforms) $60 ;