SAS importing multiple datasets to save name of dataset as variable - sas

I need to access a directory with some sas datasets named all_ci, all_pd, all_vs, etc. ci would be 'care info', pd would be 'patient data' and vs would be 'vital stats.' I am reading them in as such:
data ci_all;
set DIRECTORY.all:; run;
I get a table that looks like this:
No.
16
25
20
This works in only setting all the sets that begin with all. The issue is that I need an output that looks like this:
Category No.
Patient Data 16
Vital Statistics 25
Care Info 20
Since the original all_ datasets do not have the category label, I have to manually count in which order the all_ dataset was read, and then label it. I was wondering if there was a way which saves the name of the dataset that's being read in so I can easier label them in the rows.

Use the INDSNAME option on the SET statement. You need to copy the value to a new variable since the variable referenced in the dataset option is automatically dropped.
libname DIRECT 'mydirectory' ;
data ci_all;
lenght dsname indsname $41 ;
set DIRECT.all: indsname=indsname;
dsname=indsname;
run;

Related

how to update a table when the where clause's value has more than 32 characters in module of proc SQL of SAS enterprise guide

here is the scenario: at the beginning, I prepare to import a csv file; then in Proc SQL, I insert the record of temp data set into database, the following are my difficulties:
for the sake of audit, I want to update one record in a table in the database to record this insert operation:
update table1
set inserted_record=&SQLOBS, insert_date=today()
where filename=&csv_file_name;
But the length of the filename is more than 32 character.what should I do ? Thanks!
My SAS code is like the following:
DATA Temp1;
File_name="kkkkkkkkkkk_product_information_20200101_20211005_FULL.csv"
run;
Data work.temptable;
length
Product_ID $36
Worth_USD $9;
Format
Product_ID Char36.
Worth_USD Char9.;
Informat
Infile
input
Run;
Libname lib1 Teradata user=userid Password=xxxxxx
proc SQL;
insert into lib1.table1(col1,col2)
select prodcut_id,worth_usd from work.temp_table;
update lib1.import_summary set inserted_record=&sqlobs,operated_date=today() where file_name='&file_name';
Run;
according to the log, the SAS code can do the insert operation successfully while the update operation is not (the log shows "No rows were updated"). I check the table of import_summary, there is already a record whose file_name is "kkkkkkkkkkk_product_information_20200101_20211005_FULL.csv". It should be updated. Who can provide the comments? Thanks!
From your code shown this shouldn't affect anything, you do need to have quotes around the file name as it's likely a character field but the 32 char limit is only on data set names which this is not and the file name doesn't have a 32 character limit.
update table1
set inserted_record=&SQLOBS, insert_date=today()
where filename="&csv_file_name";
EDIT:
This needs double quotes, not single quotes:
where file_name='&file_name';

Variable with tables's names SAS

I want to create custom variable that will store table's name from which observation is comming from.
Something like this:
data FTTH_SOHO_2;
ATTRIB scoring_month;
set fmscore.SCORE_FTTH_CHURN_SOHO_202009 - fmscore.SCORE_FTTH_CHURN_SOHO_202012
fmscore.SCORE_FTTH_CHURN_SOHO_202101 - fmscore.SCORE_FTTH_CHURN_SOHO_202106 = tablename;
scoring_month = tablename;
where tp_desig_num = 'XXXXXXXXXXX';
run;
Ofcourse I get syntax error but is it possible to store the name of currety used Data Set to some kind variable and use it to mark with it observation from which it came from?
I need to see months from which I recive observations.
You're looking for the INDSNAME option.
data want;
set have1 have2 have3 indsname=dsn;
ds_name = dsn;
run;
You do have to create a variable with an assignment statement separate from the indsname option, as the option creates only a temporary variable.

Iteratively adding to merged SAS dataset

I have 18 separate datasets that contain similar information: patient ID, number of 30-day equivalents, and total day supply of those 30-day equivalents. I've output these from a dataset that contains those 3 variables plus the medication class (VA_CLASS) and the quarter it was captured in (a total of 6 quarters).
Here's how I've created the 18 separate datasets from the snip of the dataset shown above:
%macro rx(class,num);
proc sql;
create table dm_sum&clas._qtr&num as select PatID,
sum(equiv_30) as equiv_30_&class._&num
from dm_qtrs
where va_class = "HS&class" and dm_qtr = &qtr
group by 1;
quit;
%mend;
%rx(500,1);
%rx(500,2);
%rx(500,3);
%rx(500,4);
%rx(500,5);
%rx(500,6);
%rx(501,1);
and so on...
I then need to merge all 18 datasets back together by PatID and what I'd like to do is iteratively add the next dataset created to the previous, as in, add dataset dm_sum_500_qtr3 to a file that already contains the results of dm_sum_500_qtr1 & dm_sum_500_qtr1.
Thanks for looking, Brian
In the macro append the created data set to it an accumulator data set. Be sure to delete it before starting so there is a fresh accumulation. If the process is run at different times (like weekly or monthly) you may want to incorporate a unique index to prevent repeated appendings. If you are stacking all these sums, the create table should also select va_class and dm_qtr
%macro (class, num, stack=perm.allClassNumSums);
proc sql; create table dm_sum&clas._qtr&num as … ;
proc append force base=perm.allClassNumSums data=dm_sum&clas._qtr#
run;
%mend;
proc sql;
drop table perm.allClassNumSums;
%rx(500,1)
%rx(500,2)
%rx(500,3)
%rx(500,4)
%rx(500,5)
…
A better approach might be a single query with an larger where, and leave the class and qtr as categorical variables. Your current approach is moving data (class and qtr) into metadata (column names). Such a transformation makes additional downstream processing more difficult.
Proc TABULATE or REPORT can be use a CLASS statement to assist the creation of output having category based columns. These procedures might even be able to work directly with the original data set and not require a preparatory SQL query.
proc sql;
create table want as
select
PatID, va_class, dm_qtr,
sum(equiv_30) as equiv_30_sum
from dm_qtrs
where catx(':', va_class, dm_sqt) in
(
'HS500:1'
'HS500:2'
'HS500:3'
…
'HS501:1'
)
group by PatID, va_class, dm_qtr;
quit;

Converting sas numeric variable type in dataset to character type

I'm importing a csv to a sas dataset with this code:
PROC IMPORT
DATAFILE = '/folders/myshortcuts/SASsoftware_rialto_2015/providence_med_claims_15.csv'
OUT = medical
DBMS=DLM REPLACE;
DELIMITER='|';
getnames=yes;
run;
For the subsequent code it wants one of the fields called DIAGNOSIS_VERSION_CODE in this dataset to be a character type rather than numeric type which is the default. How can I change that default in the above code or convert the field in the dataset?
I tried this and it didn't work:
contents data=medical;
modify medical;
format DIAGNOSIS_VERSION_CODE $CHAR8.;
contents data=medical;
run;
You cannot use PROC DATASETS to change a variable's definition or values. You will need to create a new dataset. You can use the RENAME statement to make the new variable have the name of the old one.
data new_medical;
set medical ;
new_diagnosis_version_code = put(diagnosis_version_code,Z8.);
rename diagnosis_version_code=old_diagnosis_version_code
new_diagnosis_version_code=diagnosis_version_code
;
run;
To prevent this in the future you should write your own data step to read the data instead of asking PROC IMPORT to guess what data you have. Then you have control over how the variables are created.

SAS: concatenate different datasets while keeping the individual data table names

I'm trying to concatenate multiple datasets in SAS, and I'm looking for a way to store information about individual dataset names in the final stacked dataset.
For eg. initial data sets are "my_data_1", "abc" and "xyz", each with columns 'var_1' and 'var_2'.
I want to end up with "final" dataset with columns 'var_1', 'var_2' and 'var_3'. where 'var_3' contains values "my_data_1", "abc" or "xyz" depending on from which dataset a particular row came.
(I have a cludgy solution for doing this i.e. adding table name as an extra variable in all individual datasets. But I have around 100 tables to be stacked and I'm looking for an efficient way to do this.)
If you have SAS 9.2 or newer you have the INDSNAME option
http://support.sas.com/kb/34/513.html
So:
data final;
format dsname datasetname $20.; *something equal to or longer than the longest dataset name including the library and dot;
set my_data_1 abc xyc indsname=dsname;
datasetname=dsname;
run;
Use the in statement when you set each data set:
data final;
set my_data_1(in=a) abc(in=b) xyc(in=c);
if a then var_3='my_data_1';
if b then var_3='abc';
if c then var_3='xyz';
run;