I have a WORK dataset with more than 30 columns but only 2 columns out of them are date fields. (Start date and End date). I want the date format in the permanent dataset to be in date. and not in yymmdd10. which is the current format in work dataset. When I used the below code, the two date fields are taking first two positions. I dont want to reorder the positions and at the same time dont want to mention the format with all 30+ columns. Could someone please help me if there is any way for this?
data DLR.DEALER;
set work.dealer_invoices; * this dataset contains more than 30 columns;
format start_dt end_dt date.;
run;
I could not find any solution for this on our site. Any help is highly appreciated than just asking me to mention all the columns in the format statement :) Thanks in advance.
Certainly the format statement shouldn't have any impact on ordering given its location.
A workaround would be to use PROC DATASETS to change the format instead of in the data step.
You also could "mention all columns" fairly easily.
proc sql;
select name into :namelist separated by ' '
from dictionary.columns
where libname='WORK' and memname='DEALER_INVOICES'
order by varnum;
quit;
then
data DLR.DEALER;
retain &namelist;
set work.dealer_invoices;
format...;
run;
Related
I have a dataset with 20 columns all starting with the name morb_, which are all 1 or 2, coded as No and Yes. There is an additional column called Pat_TNO which is the patient reference number. Patients have more than one row.
I wish to create a new dataset which summarises whether each patient has had at least one of each type of event. So far the code I have written works perfectly, but is there a way to simplify it using an array?
proc sql;
select
Pat_TNO,
max(morb_1) as morb_1 format yn.,
max(morb_2) as morb_2 format yn. /* etc etc */
from morbidity
group by Pat_TNO;
quit;
COumn names aren't morb_1 and morb_2, rather morb_amputation, morb_mi, morb_tia, etc.
proc summary data=morbidity nway missing;
class pat_tno;
output out=max max(morb_:) = ;
run;
How do i start this??
I have two data sets.
For the output you will deliver:
It should be an excel or XML format
Each query logic/programmed check should be on each tab
Columns should be
Subject #,
Visit Date (You will need the Visit Date Listing also attached)
Visit Name (Visit date from the file_34422 must match Visit name in the Blood Pressure File)
Date of Assessment (From the BP Log), VSBPDT_RAW, VSTPT, BP results.
A column for SYBP1. SYBP2, SYBP3, DIABP1, DIABP2, DIABP3
Findings/query text.
Below are Specification for BP:
For same SUBJECT and same FOLDERNAME, where VSTPT is Blood Pressure 1.
if VSBPYN is No, then all must be null or =0 (VSBPDT_RAW, VSBPTM1, SYSBP1, DIABP1, VSBPND2, VSBPTM2, SYSBP2, DIABP2, VSBPND3, VSBPTM3, SYSBP3, DIABP3)
This is what i have started with and
proc sql;
select
f.subject,
f.SVSTDT_RAW, f.FolderName,
b.FolderName,
VSBPDT_RAW, VSTPT,
SYSBP1, SYSBP2, SYSBP3,
DIABP1, DIABP2, DIABP3
FROM first_data as f, bp_data as b
group by subject, foldername
where f.subject = b.subject
having VSTPT is Blood Pressure set 1,
VSBPYN is No;
quit;
I just need to be pointed towards the right direction. I know this can't be right.
I do not know the exact structure of your data, so the solution below may need to be modified by you to select the right columns.
From the descritpion, this looks like it might be a good situation for SQL and a data step. You have a lot of columns to merge with the bp table. It will be easy to do merge all of these columns with first_data in SQL.
When you have lots of by-row conditionals, a data step will be easier to work with and read than many CASE statements in SQL. We'll do a two-stage approach in which we use SQL and a data step.
Step 1: Merge the data
proc sql noprint;
create table stage as
select t1.*
, t2.VSBPYN
from bp_data as t1
INNER JOIN
first_data as t2
ON t1.subject = t2.subject
AND foldername = t2.foldername
where t1.VSTPT = 1
;
quit;
Step 2: Conditionally set values to missing
Next, we'll do a data step for our conditional logic. call missing() is a useful function that will let you set the value of many variables to missing all in a single statement.
data want;
set stage;
if(upcase(VSBPYN) = 'NO') then call missing(VSBPDT_RAW, VSBPTM1, SYSBP1, DIABP1,
VSBPND2, VSBPTM2, SYSBP2, DIABP2,
VSBPND3, VSBPTM3, SYSBP3, DIABP3
);
run;
Step 3: Output to Excel
Finally, we sent the output to Excel.
proc export
data=want
file='/my/location/want.xlsx'
dbms=xlsx
replace;
run;
I want to insert values into a new table, but I keep getting the same error: VALUES clause 1 attempts to insert more columns than specified after the INSERT table name. This is if I don't put apostrophes around my date. If I do put apostrophes then I get told that the data types do not correspond for the second value.
proc sql;
create table date_table
(cvmo char(6), next_beg_dt DATE);
quit;
proc sql;
insert into date_table
values ('201501', 2015-02-01)
values ('201502', 2015-03-01)
values ('201503', 2015-04-01)
values ('201504', 2015-05-01);
quit;
The second value has to remain as a date because it used with > and < symbols later on. I think the problem may be that 2015-02-01 just isn't a valid date format since I couldn't find it on the SAS website, but I would rather not change my whole table.
Date literals (constants) are quoted strings with the letter d immediately after the close quote. The string needs to be in a format that is valid for the DATE informat.
'01FEB2015'd
"01-feb-2015"d
'1feb15'd
If you really want to insert a series of dates then just use a data step with a DO loop. Also make sure to attach one of the many date formats to your date values so that they will print as human understandable text.
data data_table ;
length cvmo $6 next_beg_dt 8;
format next_beg_dt yymmdd10.;
do _n_=1 to 4;
cvmo=put(intnx('month','01JAN2015'd,_n_-1,'b'),yymmn6.);
next_beg_dt=intnx('month','01JAN2015'd,_n_,'b');
output;
end;
run;
#tom suggest you in comments how to use date and gives very good answer how to it efficently, which is less error prone than typing values. I am just putting the same into the insert statement.
proc sql;
create table date_table
(cvmo char(6), next_beg_dt DATE);
quit;
proc sql;
insert into date_table
values ('201501', "01FEB2015"D)
;
Good morning all,
I've got 3 different columns in a data set that represent a month, a date, and a year as numbers. My issue right now is concatenating these together in PROC SQL while keeping them formatted as a date. So far, I've tried the following, but I'm only getting results that show every date as a period ("."). You'll notice that I had to convert them to characters to be able to concatenate them.
PROC SQL;
SELECT
INPUT(PUT(f.MTH,z2.) || '-' || PUT(f.DAY,z2.) || '-' || PUT(f.YR,z4.),date9.)
FROM
table f
;QUIT;
I tired rearranging the year/day/month, and tried with and without the '-' between them. Still, I'm just getting a period in every row.
It is worth noting that the numbers look fine when just concatenated by themselves, without any attempt at date formatting. But I need their column to be a DATE column for the process the data is being used for.
PUT(f.YR,z4.)|| PUT(f.MTH,z2.) ||PUT(f.DAY,z2.)
^looks fine with and without '-' separating the numbers. On that note, the date9. format isn't the absolute needed date format, I really just need it to be a DATE of some sort.
What am I missing here? Should I not be relying so heavily on PROC SQL to do this?
Use the MDY function, since the variables are already numbers, I think it's the best option.
proc sql;
select
mdy(12,1,2015) format ddmmyy10. as DAY_DATE_FORMAT
from table;
quit;
Just to explain to you why your code didn't work.
The date9. informat you're using expects a month entry like "Jan", "Feb", etc. But you're passing a number to the informat (z2.). If you change date9. to ddmmyy10 in your input, it will work.
I have this dataset here which looks like this:
Basically I want to manipulate the data set so that I have
GVKEY1 as unique such as 1004 then a unique year number such as 1996 then several gvkey2 after that. However the number of gvkey2 for each year is not the same. Does anyone know how to get around this problem? This means I will have several 12 lines of data for gvkey1 for 1004 since i have years from 1996 to 2008. Then for each year I will have many columns where each column will have a gvkey2.
Best Regards,
Naz
Can you not just use PROC TRANSPOSE?
proc sort data=your_data_set out=temp1;
by gvkey1 year;
run;
proc transpose data=temp1 out=temp2;
by gvkey1 year;
var gvkey2;
run;
This will give you a series of variables COL1 - COLx. Use the PREFIX option for different variable names.
I'm not sure I've understood your question, but if you're looking for unique gvkey1/year pairs, you could do either of these:
proc sql;
create table results as
select distinct gvkey1, year
from _your_data_set;
quit;
or
proc sort data=_your_data_set(keep=gvkey1 year) out=results nodupkey;
by gvkey1 year;
run;
If that's not what you're looking for, I suggest posting an example of the results you want.