I was wondering if SAS can match observations in the same table based on multiple matching conditions and then tag them.
For instance I want observations which matches on age and gender but only differs based on if they like star wars or not.
This is the data I have
age gender Like star wars Location
34 male 1 US
36 female 0 UK
24 female 1 AU
45 female 1 US
34 male 0 CH
36 female 1 US
57 female 0 US
Want
> age gender Like star wars Location Match
> 34 male 1 US Yes
> 36 female 0 UK Yes
> 24 female 1 AU No
> 45 female 1 US No
> 34 male 0 CH Yes
> 36 female 1 US Yes
> 57 female 0 US No
If you sort the data you have by age and gender, then the "matches" will be grouped together. The non-matches can easily be detected with first. and last. processing. Rows that are both first. and last. do not have a match.
proc sort data=have;
by age gender;
data want;
set have;
by age gender;
match = ifc(first.age and last.age, 'No', 'Yes');
run;
Related
This is my code:
DATA sales;
INFILE 'D:\Users\...\Desktop\Onions.dat';
INPUT VisitingTeam $ 1-20 ConcessionSales 21-24 BleacherSales 25-28
OurHits 29-31 TheirHits 32-34 OurRuns 35-37 TheirRuns 38-40;
PROC PRINT DATA = sales;
TITLE 'SAS Data Set Sales';
RUN;
This is the data, but the spacing may be incorrect.
Columbia Peaches 35 67 1 10 2 1
Plains Peanuts 210 . 2 5 0 2
Gilroy Garlics 151035 12 11 7 6
Sacramento Tomatoes 124 85 15 4 9 1
;
I need to add or delete a blank column at the 19th
column. Can someone help?
Just open the dataset and then look at what the variable name is. Then do:
Data Want (drop=varible_name_you_are_dropping); /*This is your output dataset*/
Set have; /*this is your dataset you have*/
Run;
I have large dataset of a few million patient encounters that include a diagnosis, timestamp, patientID, and demographic information.
We have found that a particular type of disease is frequently comorbid with a common condition.
I would like to count the number of this type of disease that each patient has, and then create a histogram showing how many people have 1,2,3,4, etc. additional diseases.
This is the format of the data.
PatientID Diagnosis Date Gender Age
1 282.1 1/2/10 F 25
1 282.1 1/2/10 F 87
1 232.1 1/2/10 F 87
1 250.02 1/2/10 F 41
1 125.1 1/2/10 F 46
1 90.1 1/2/10 F 58
2 140 12/15/13 M 57
2 282.1 12/15/13 M 41
2 232.1 12/15/13 M 66
3 601.1 11/19/13 F 58
3 231.1 11/19/13 F 76
3 123.1 11/19/13 F 29
4 601.1 12/30/14 F 81
4 130.1 12/30/14 F 86
5 230.1 1/22/14 M 60
5 282.1 1/22/14 M 46
5 250.02 1/22/14 M 53
Generally, I was thinking of a DO loop, but I'm not sure where to start because there are duplicates in the dataset, like with patient 1 (282.1 is listed twice). I'm not sure how to account for that. Any thoughts?
Target diagnoses to count would be 282.1, 232.1, 250.02. In this example, patient 1 would have a count of 3, patient 2 would have 2, etc.
Edit:
This is what I have used, but the output is showing each PatientID on multiple lines in the output.
PROC SQL;
create table want as
select age, gender, patientID,
count(distinct diagnosis_description) as count
from dz_prev
where diagnosis in (282.1, 232.1)
group by patientID;
quit;
This is what the output table looks like. Why is this patientID showing up so many times?
Obs AGE GENDER PATIENTID count
1 55 Male 107828695 1
2 54 Male 107828695 1
3 54 Male 107828695 1
4 54 Male 107828695 1
5 54 Male 107828695 1
If you include variables that are neither grouping variables or summary statistics then SAS will happily re-merge your summary statistics back with all of the source records. That is why you are getting multiple records. AGE can usually vary if your dataset covers many years. And GENDER can also vary if your data is messy. So for a quick analysis you might try something like this.
create table want as
select patientID
, min(age) as age_at_onset
, min(gender) as gender
, count(distinct diagnosis_description) as count
from dz_prev
where diagnosis in (282.1, 232.1)
group by patientID
;
I think you can get what you want with an SQL statement
PROC SQL NOPRINT;
create table want as
select PatientID,
count(distinct Diagnosis) as count
from have
where Diagnosis in (282.1, 232.1, 250.02)
group by PatientID;
quit;
This filters to only the diagnoses you are interested in, counts the distinct times they are seen, by the PatientID, and saves the results to a new table.
i need help in finding how to convert datavalues in a column to reverse order into new column or same column.I mean first datavalue in column should be the last value in column and vice versa.
example:
name age
karl 40
lowry 56
jim 29
robert 34
samuel 60
harry 47
the output i need should look like this.
name age
harry 47
samuel 60
robert 34
jim 29
lowry 56
karl 40
i need reverse order of the datavalues on variables age and name or only on one variable.
First create a variable of the observation number:
data temp;
set have;
ObsNum = _n_;
run;
Then use that variable to sort the dataset:
proc sort data=temp out=want (drop=ObsNum);
by descending ObsNum;
run;
I have a data set contains a series variables named; PG_86xt, AG_86xt,... with same suffix _86xt. How can I remove such suffix while renaming these variables?
I know how to add prefix or suffix. But the logic of removing them seems to be a little bit different. I think proc dataset modify is still the way to go. But the length of substring before suffix (or after prefix) is unknown.
The example on how to add prefix or suffix
data one;
input id name :$10. age score1 score2 score3;
datalines;
1 George 10 85 90 89
2 Mary 11 99 98 91
3 John 12 100 100 100
4 Susan 11 78 89 100
;
run;
proc datasets library = work nolist;
modify one;
rename &suffixlist;
quit;
You can use the scan function to get the desired result.
By altering the example you have in the link to fit your example:
data one;
input id name :$10. age PG_86xt AG_86xt IG_86xt;
datalines;
1 George 10 85 90 89
2 Mary 11 99 98 91
3 John 12 100 100 100
4 Susan 11 78 89 100
;
run;
By filtering on only those column that fits your convention (XX_86xt), you could use the first part of the scan for renaming.
proc sql noprint;
select cats(name,'=',scan(name, 1, '_'))
into :suffixlist
separated by ' '
from dictionary.columns
where libname = 'WORK' and memname = 'ONE' and '86xt' = scan(name, 2, '_');
quit;
You can use the index function to find the (first) place in each variable name where the suffix / prefix starts, then use that to construct appropriate parameters for substr. It's a bit more work than the code in your example, but you'll get there.
I have a table with four variables and i want the table a table with combination of all values. Showing a table with only 2 columns as an example.
NAME AMOUNT COUNT
RAJ 90 1
RAVI 20 4
JOHN 30 5
JOSEPH 40 3
The following output is to show the values only for raj and the output should be for all names.
NAME AMOUNT COUNT
RAJ 90 1
RAJ 90 4
RAJ 90 5
RAJ 90 3
RAJ 20 1
RAJ 20 4
RAJ 20 5
RAJ 20 3
RAJ 30 1
RAJ 30 4
RAJ 30 5
RAJ 30 3
RAJ 40 1
RAJ 40 4
RAJ 40 5
RAJ 40 3
.
.
.
.
There are a couple of useful options in SAS to do this; both create a table with all possible combinations of variables, and then you can just drop the summary data that you don't need. Given your initial dataset:
data have;
input NAME $ AMOUNT COUNT;
datalines;
RAJ 90 1
RAVI 20 4
JOHN 30 5
JOSEPH 40 3
;;;;
run;
There is PROC FREQ with SPARSE.
proc freq data=have noprint;
tables name*amount*count/sparse out=want(drop=percent);
run;
There is also PROC TABULATE.
proc tabulate data=have out=want(keep=name amount count);
class name amount count;
tables name*amount,count /printmiss;
run;
This has the advantage of not conflicting with the name for the COUNT variable.
Try
PROC SQL;
CREATE TABLE tbl_out AS
SELECT a.name AS name
,b.amount AS amount
,c.count AS count
FROM tbl_in AS a, tbl_in AS b, tbl_in AS c
;
QUIT;
This performs a double self-join and should have the desired effect.
Here's a variation on #JustinJDavies's answer, using an explicit CROSS JOIN clause:
data have;
input NAME $ AMOUNT COUNT;
datalines;
RAJ 90 1
RAVI 20 4
JOHN 30 5
JOSEPH 40 3
run;
PROC SQL;
create table combs as
select *
from have(keep=NAME)
cross join have(keep=AMOUNT)
cross join have(keep=COUNT)
order by name, amount, count;
QUIT;
Results:
NAME AMOUNT COUNT
JOHN 20 1
JOHN 20 3
JOHN 20 4
JOHN 20 5
JOHN 30 1
JOHN 30 3
JOHN 30 4
JOHN 30 5
...