I am trying to extract all the Time occurrences for only the recent visit. Can someone help me with the code please.
Here is my data:
Obs Name Date Time
1 Bob 2017090 1305
2 Bob 2017090 1015
3 Bob 2017081 0810
4 Bob 2017072 0602
5 Tom 2017090 1300
6 Tom 2017090 1010
7 Tom 2017090 0805
8 Tom 2017072 0607
9 Joe 2017085 1309
10 Joe 2017081 0815
I need the output as:
Obs Name Date Time
1 Bob 2017090 1305,1015
2 Tom 2017090 1300,1010,0805
3 Joe 2017085 1309
Right now my code is designed to give me only one recent entry:
DATA OUT2;
SET INP1;
BY DATE;
IF FIRST.DATE THEN OUTPUT OUT2;
RETURN;
I would first sort the data by name and date. Then I would transpose and process the results.
proc sort data=have;
by name date;
run;
proc transpose data=have out=temp1;
by name date;
var value;
run;
data want;
set temp1;
by name date;
if last.name;
format value $2000.;
value = catx(',',of col:);
drop col: _name_;
run;
You may want to further process the new VALUE to remove excess commas (,) and missing value .'s.
Very similar to the question yesterday from another user, you can use quite a few solutions here.
SQL again is the easiest; this is not valid ANSI SQL and pretty much only SAS supports this, but it does work in SAS:
proc sql;
select name, date, time
from have
group by name
having date=max(date);
quit;
Even though date and time are not on the group by it's legal in SAS to put them on the select, and then SAS automatically merges (inner joins) the result of select name, max(date) from have group by name having date=max(date) to the original have dataset, returning multiple rows as needed. Then you'd want to collapse the rows, which I leave as an exercise for the reader.
You could also simply generate a table of maximum dates using any method you choose and then merge yourself. This is probably the easiest in practice to use, in particular including troubleshooting.
The DoW loop also appeals here. This is basically the precise SAS data step implementation of the SQL above. First iterate over that name, figure out the max, then iterate again and output the ones with that max.
proc sort data=have;
by name date;
run;
data want;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
max_Date = max(max_date,date);
end;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
if date=max_date then output;
end;
run;
Of course here you more easily collapse the rows, too:
data want;
length timelist $1024;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
max_Date = max(max_date,date);
end;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
if date=max_date then timelist=catx(',',timelist,time);
if last.name then output;
end;
run;
If the data is sorted then just retain the first date so you know which records to combine and output.
proc sort data=have ;
by name descending date time;
run;
data want ;
set have ;
by name descending date ;
length timex $200 ;
retain start timex;
if first.name then do;
start=date;
timex=' ';
end;
if date=start then do;
timex=catx(',',timex,time);
if last.date then do;
output;
call missing(start,timex);
end;
end;
drop start time ;
rename timex=time ;
run;
Related
I have a sas datebase with something like this:
id birthday Date1 Date2
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
And I want the data in this form:
id Date Datetype
1 12/4/01 birthday
1 12/4/13 1
1 12/3/14 2
2 12/3/01 birthday
2 12/6/13 1
2 12/2/14 2
3 12/9/01 birthday
3 12/4/03 1
3 12/9/14 2
4 12/8/13 birthday
4 12/3/14 1
4 12/10/16 2
Thanks by ur help, i'm on my second week using sas <3
Edit: thanks by remain me that i was not finding a sorting method.
Good day. The following should be what you are after. I did not come up with an easy way to rename the columns as they are not in beginning data.
/*Data generation for ease of testing*/
data begin;
input id birthday $ Date1 $ Date2 $;
cards;
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
; run;
/*The trick here is to use date: The colon means everything beginning with date, comparae with sql 'date%'*/
proc transpose data= begin out=trans;
by id;
var birthday date: ;
run;
/*Cleanup. Renaming the columns as you wanted.*/
data trans;
set trans;
rename _NAME_= Datetype COL1= Date;
run;
See more from Kent University site
Two steps
Pivot the data using Proc TRANSPOSE.
Change the names of the output columns and their labels with PROC DATASETS
Sample code
proc transpose
data=have
out=want
( keep=id _label_ col1)
;
by id;
var birthday date1 date2;
label birthday='birthday' date1='1' date2='2' ; * Trick to force values seen in pivot;
run;
proc datasets noprint lib=work;
modify want;
rename
_label_ = Datetype
col1 = Date
;
label
Datetype = 'Datetype'
;
run;
The column order in the TRANSPOSE output table is:
id variables
copy variables
_name_ and _label_
data based column names
The sample 'want' shows the data named columns before the _label_ / _name_ columns. The only way to change the underlying column order is to rewrite the data set. You can change how that order is perceived when viewed is by using an additional data view, or an output Proc that allows you to specify the specific order desired.
I have a list of customer addresses in a dataset where I am trying to locate the Country of residence, for example: NEWSOUTHWALESAUSTRALIA could be indexed to report the country as Australia. I am trying to use the do loop approach to scan through the list of 252 Countries to relate the Country of residence from a dataset called address_format
The dataset test has the list of 252 Countries which have upcased & compressed, as has the field concat_address, so should no issues with differences in the text.
%macro counter;
%do ii = 1 %to 252;
data test;
set country_data (obs=&ii.);
call symput('New_upcase_country',trim(New_upcase_country));
country_new = compress(trim(country_two));
call symput('country_new',trim(country_new));
run;
data ADDRESS_FORMAT_NEW;
set ADDRESS_FORMAT;
length success $70.;
format success $70.;
if index(concat_address,"&country_new.") ge 1
then do ;
country="&country_new.";
end;
run;
%end;
%mend;
%counter;
For some odd reason If I manually programme if index(concat_address,'AUSTRALIA'), I get results, but inside the macro the results are blank.
Is there something obvious I am missing that is preventing the success of the country index?
The obs= option can be thought of as lastobs= (there is no lastobs option).
The option for skipping the first n-1 observations is firstobs=
This example will yield 4 rows (8-5+1)
data class;
set sashelp.class (firstobs=5 obs=8);
run;
So you want
firstobs=&ii obs=&ii, or
firstobs=&ii and a STOP;RUN; to prevent going to row &ii+1 and beyond.
Despite the above answer, I would recommend switching to a no macro approach that does all 252 checks in one data step (versus one step per check). There are numerous ways of doing such, here is one way that does not use arrays or hashes
For example:
data have;
input;
text = _infile_;
datalines;
BONGOBONGOAUSTRALIA
SOMEWHERE IN CHINA
CANADA USA
TIBET LANE, NORWAY
run;
data countries;
length name $50;
input;
name = _infile_;
datalines;
AUSTRALIA
GERMANY
UNITED STATES
TIBET
NORWAY
run;
Output only first match. An important code feature is using point= and nobs= options on the set statement inside the do loop.
data want;
set have;
do index = 1 to check_count until (found);
set countries point=index nobs=check_count;
found = index(text,trim(name));
if found then matched_country = name;
end;
run;
Output all matches
data want (keep=text matched_country);
set have;
do index = 1 to check_count;
set countries point=index nobs=check_count;
found = index(text,trim(name));
if found then do;
found_count = sum(found_count,1);
matched_country = name;
output;
end;
end;
if not found_count > 0 then do;
matched_country = '** NO MATCH **';
output;
end;
run;
Suppose the dataset has three columns
Date Region Price
01-03 A 1
01-03 A 2
01-03 B 3
01-03 B 4
01-03 A 5
01-04 B 4
01-04 B 6
01-04 B 7
I try to get the lead price by date and region through following code.
data want;
set have;
by _ric date_l_;
do until (eof);
set have(firstobs=2 keep=price rename=(price=lagprice)) end=eof;
end;
if last.date_l_ then call missing(lagprice);
run;
However, the WANT only have one observations. Then I create new_date=date and try another code:
data want;
set have nobs=nobs;
do _i = _n_ to nobs until (new_date ne Date);
if eof1=0 then
set have (firstobs=2 keep=price rename=(price=leadprice)) end=eof1;
else leadprice=.;
end;
run;
With this code, SAS is working slowly. So I think this code is also not appropriate. Could anyone give some suggestions? Thanks
Try sorting by the variables you want lead price for then set together twice:
data test;
length Date Region $12 Price 8 ;
input Date $ Region $ Price ;
datalines;
01-03 A 1
01-03 A 2
01-03 B 3
01-03 B 4
01-03 A 5
01-04 B 4
01-04 B 6
01-04 B 7
;
run;
** sort by vars you want lead price for **;
proc sort data = test;
by DATE REGION;
run;
** set together twice -- once for lead price and once for all variables **;
data lead_price;
set test;
by DATE REGION;
set test (firstobs = 2 keep = PRICE rename = (PRICE = LEAD_PRICE))
test (obs = 1 drop = _ALL_);
if last.DATE or last.REGION then do;
LEAD_PRICE = .;
end;
run;
You can use proc expand to generate leads on numeric variables by group. Try the following method instead:
Step 1: Sort by Region, Date
proc sort data=have;
by Region Date;
run;
Step 2: Create a new ID variable to denote observation numbers
Because you have multiple values per date per region, we need to generate a new ID variable so that proc expand uses lead by observation number rather than by date.
data have2;
set have;
_ID_ = _N_;
run;
Step 3: Run proc expand by region with the lead transformation
lead will do exactly as it sounds. You can lead by as many values as you'd like, as long as the data supports it. In this case, we are leading by one observation.
proc expand data=have2
out=want;
by Region;
id _ID_;
convert Price = Lead_Price / transform=(lead 1) ;
run;
I have a SAS dataset similar to the one created here.
data have;
input date :date. count;
cards;
20APR2012 10
20APR2012 20
20APR2012 20
27APR2012 15
27APR2012 5
;
run;
proc sort data=have;
by date;
run;
I want to create a column containing the sum for each date, so it would look like
date total
20APR2012 50
27APR2012 20
I have tried using first. but I think my syntax is off. Thanks.
This is what proc means is for.
proc means data=have;
class date;
var count;
output out=want sum=total;
run;
The code below works to give you your desired result.
proc sql;
create table wanted_tab as
select
date format date9.,
sum(count) as Total
from have
group by date;
;
quit;
I need a column a total as an observation.
Input Dataset Output Dataset
------------- --------------
data input; Name Mark
input name$ mark; a 10
datalines; b 20
a 10 c 30
b 20 Total 60
c 30
;
run;
The below code which I wrote is working fine.
data output;
set input end=eof;
tot + mark;
if eof then
do;
output;
name = 'Total';
mark = tot;
output;
end;
else output;
run;
Please suggest if there is any better way of doing this.
PROC REPORT is a good solution for doing this. This summarizes the entire report - other options give you the ability to summarize in groups.
proc report out=outds data=input nowd;
columns name mark;
define name/group;
define mark/analysis sum;
compute after;
name = "Total";
line "Total" mark.sum;
endcomp;
run;
Your code is fine in general, however the issue might be in terms of performance. If the input table is huge, you end up rewriting full table.
I'd suggest something like this:
proc sql;
delete from input where name = 'Total';
create table total as
select 'Total' as name length=8, sum(mark) as mark
from input
;
quit;
proc append base=input data=total;
run;
Here you are reading full table but writing only a single row to existing table.