I want to remove columns/variables from a large SAS dataset, call it 'data'. I have all of the column names that I want to drop stored in another SAS dataset - let's call it 'var', it has a single column with header column. How do I drop all of the variables contained in 'var' from my original dataset 'data' with the drop function?
Thanks!
You can use the "into" clause of proc sql to copy the column of variable names from the "vars" data set into a macro variable that you then pass to the drop= statement in a data step. See below:
proc sql noprint;
select <name_of_column> into: vars_to_drop separated by " "
from var;
quit;
data data;
set data (drop= &vars_to_drop);
run;
Related
I have an already imported dataset where the first row contains the variable names. I know that typically when importing a dataset you use getnames = yes. However, if the data is already imported how can I make the first row the variable names using a data step?
Data looks like:
A B C
1 Name 1 Name 2 Name 3
2 2 4 66
3 3 5 6
Since reading the names as data probably made all of your variables character you can try just transposing the data twice to fix it. That will work well for small datasets.
So the first transpose will place the current name into the _NAME_ variable and convert each row into a column. The second proc transpose can drop the original name and use the first row (new COL1 variable) as the names.
proc transpose data=have out=wide ;
var _all_;
run;
proc transpose data=wide(drop=_name_ rename=(col1=_name_)) out=want(drop=_name_ _label_);
var col:;
id _name_;
run;
The problem with the already imported data is that all the numeric data was likely placed in a character variables because the 'first row' of data seen by the import process contained some character data, and drove the inference for automatic column construction.
Regardless, you will need to construct renaming pairs old-name=new-name for each variables that has to be renamed. The new-name being in row 1 makes it possible to transpose that row to arrange those name parts as data. SQL with :into and separated by can populate a macro variable for use in a proc datasets step that performs the column renaming without rewriting the entire data set. Finally, a DATA step with modify can remove a row in place, again, without rewriting the entire data set.
filename sandbox temp;
data _null_;
file sandbox;
put 'A,B,C';
put 'Name 1, Name 2, Name 3';
put '2,4,66';
put '3,5,6';
run;
proc import datafile=sandbox dbms=csv replace out=work.oops;
run;
proc transpose data=oops(obs=1) out=renames;
var _all_;
run;
proc sql noprint;
select cats(_name_,"=",compress(col1,,"KN"))
into :renames separated by ' '
from renames;
%put NOTE: &=renames;
proc datasets nolist lib=work;
modify oops;
rename &renames;
run;
data oops;
modify oops;
remove;
stop;
run;
%let syslast=oops;
I want to count the number of records in a dataset in SAS. There is a function the make this thing in a simple way? I used R ed for obtain this information there was the length() function. Morover I need the number of record to compute some percetages so I need this value not in a table but in a value that can be used for other data step. How can I fix?
Thanks in advance
Here is another solution, using SAS dictionaries,
proc sql;
select nobs into: num_obs
from dictionary.tables
where libname = "WORK" and memname = "A"
;
quit;
It is easy to get the size of many datasets by modifying the above code,
proc sql;
create table test as
select memname, nobs
from dictionary.tables
where libname = "WORK" and memname like "A%"
;
quit;
data _null_;
set test;
call symput(memname, nobs);
run;
The above code will give you the sizes of all data sets with name starting with "a" in the temporary/work library.
Assuming this is a basic SAS table that you've created, and not modified or appended to, the best way is to use the meta data held in a dataset (the Number of tries is held in a piece of meta data called "nobs"), without reading through the dataset its self and place it in a macro variable. You can do this in the following way:
Data _null_;
i=1;
If i = 0 then set DATASETTOCOUNT nobs= mycount;
Call symput('mycount', mycount);
Run;
%put &mycount.;
You will now have a macro variable that contains the number of rows in your dataset, that you can call on in other data steps using &mycount.
I have created a sas code which generates many sas datasets. Now I want to append all of them to a single excel file . So first I want to convert all the column headers of sas datasets as first observation. Then leave space between these datasets (adding a blank observation). How can we do it?
one way to do this would be to use dictionary.columns
proc sql;
create table Attribute as
select * from dictionary.columns;
Read through the table and check what attributes you are interested in. For your case you might be interested in the column "NAME" <- consist of the name of all columns.
Modify the table by adding where statement to the proc sql based on the identity of the column ( from which library / what type of file / name of file) e.g. where upcase(libname)= "WORK"
data attribute;
array column [ n ] $ length ;
do i=1 to n;
set attribute ( keep = name) ;
column [ i ] = name ;
end;
run;
Then I would proceed with data step. You could use macro variable to store the value of column's names by select variable into : but anyhow you still need to hardcode the size for the array n or any other method that store value into one observation . Also remember define the length and the type of array accordingly. You can give name to the variable in the result dataset Attribute by adding var1-varnafter the length at array statement.
For simplicity I use set statement to read observation one and one and store the value of column NAME, which is the official column name derived when using dictionary.columns into the array
Note that creating a non-temporary array would create variable(s) .
Add if you want to add the blank,
data younameit ;
merge attribute attribute(firstobs=2 keep=name rename=(name=_name));
output;
if name ne _name then do;
call missing(of _all_);
output;
end;
run;
As two datasets start with different observation and column names do not duplicate within one dataset, the next row of a valid observation ( derived from the first output statement in the resulting dataset would be empty due to call missing ( of _all_ ) ; output;
Sounds like you just want to combine the datasets and write the results to the Excel file. Do you really need the extra empty row?
libname out xlsx 'myfile.xlsx';
data out.report ;
set ds1 ds2 ...;
run;
Ensure that all your columns are character (or numeric, substitute numeric), then in your data step use:
array names{*} _character_;
do i=1 to dim(names);
call label(names{i}, names{i});
end;
output;
I hope someone can help. I have a large dataset imported to SAS with thousands of variables. I want to create a new dataset by extracting variables that have a specific keyword in their name. For example, the following variables are in my dataset:
AAYAN_KK_Equity_Ask
AAYAN_KK_Equity_Bid
AAYAN_KK_Equity_Close
AAYAN_KK_Equity_Date
AAYAN_KK_Equity_Volume
AAYANRE_KK_Equity_Ask
AAYANRE_KK_Equity_Bid
AAYANRE_KK_Equity_Close
AAYANRE_KK_Equity_Date
I want to extract variables that end with _Ask and _Bid without knowing the rest of the variable's name. Is there a way to do that? I want to try using a do loop but don't know how to instruct SAS to compare each variable's last part of the name with _Ask or _Bid.
Afterwords. I want to create a new variable for each set that starts with full name of the variable except the last part (Which is _Ask or _Bid). Can I do that in using an assignment statement?
You probably want to query sashelp.vtable which holds the metadata about your data set. Assuming your data is in the library WORK and called TABLE the following creates a list of the variables that end in ASK.
proc sql;
select name into :varlist separated by " "
from sashelp.vcolumn
where libname="WORK" and memname="TABLE" and upcase(name) like '%_ASK';
quit;
*To rename the variables with MID generate a rename statement;
proc sql;
select catx("=", name, tranwrd(upcase(name), "_ASK", "_MID"))
into :rename_list separated by " "
from sashelp.vcolumn
where libname="WORK" and memname="TABLE" and upcase(name) like '%_ASK';
quit;
%put &rename_list;
data want_ask;
set work.table
(keep = &varlist);
rename &rename_list;
run;
I have a SAS dataset which has 20 character variables, all of which are names (e.g. Adam, Bob, Cathy etc..)
I would like a dynamic code to create variables called Adam_ref, Bob_ref etc.. which will work even if there a different dataset with different names (i.e. don't want to manually define each variable).
So far my approach has been to use proc contents to get all variable names and then use a macro to create macro variables Adam_ref, Bob_ref etc..
How do I create actual variables within the dataset from here? Do I need a different approach?
proc contents data=work.names
out=contents noprint;
run;
proc sort data = contents; by varnum; run;
data contents1;
set contents;
Name_Ref = compress(Name||"_Ref");
call symput (NAME, NAME_Ref);
%put _user_;
run;
If you want to create an empty dataset that has variables named like some values you have in a macro variables you could do something like this.
Save the values into macro variables that are named by some pattern, like v1, v2 ...
proc sql;
select compress(Name||"_Ref") into :v1-:v20 from contents;
quit;
If you don't know how many values there are, you have to count them first, I assumed there are only 20 of them.
Then, if all your variables are character variables of length 100, you create a dataset like this:
%macro create_dataset;
data want;
length %do i=1 %to 20; &&v&i $100 %end;
;
stop;
run;
%mend;
%create_dataset; run;
This is how you can do it if you have the values in macro variable, there is probably a better way to do it in general.
If you don't want to create an empty dataset but only change the variable names, you can do it like this:
proc sql;
select name into :v1-:v20 from contents;
quit;
%macro rename_dataset;
data new_names;
set have(rename=(%do i=1 %to 20; &&v&i = &&v&i.._ref %end;));
run;
%mend;
%rename_dataset; run;
You can use PROC TRANSPOSE with an ID statement.
This step creates an example dataset:
data names;
harry="sally";
dick="gordon";
joe="schmoe";
run;
This step is essentially a copy of your step above that produces a dataset of column names. I will reuse the dataset namerefs throughout.
proc contents data=names out=namerefs noprint;
run;
This step adds the "_Refs" to the names defined before and drops everything else. The variable "name" comes from the column attributes of the dataset output by PROC CONTENTS.
data namerefs;
set namerefs (keep=name);
name=compress(name||"_Ref");
run;
This step produces an empty dataset with the desired columns. The variable "name" is again obtained by looking at column attributes. You might get a harmless warning in the GUI if you try to view the dataset, but you can otherwise use it as you wish and you can confirm that it has the desired output.
proc transpose out=namerefs(drop=_name_) data=namerefs;
id name;
run;
Here is another approach which requires less coding. It does not require running proc contents, does not require knowing the number of variables, nor creating a macro function. It also can be extended to do some additional things.
Step 1 is to use built-in dictionary views to get the desired variable names. The appropriate view for this is dictionary.columns, which has alias of sashelp.vcolumn. The dictionary libref can be used only in proc sql, while th sashelp alias can be used anywhere. I tend to use sashelp alias since I work in windows with DMS and can always interactively view the sashelp library.
proc sql;
select compress(Name||"_Ref") into :name_list
separated by ' '
from sashelp.vcolumn
where libname = 'WORK'
and memname = 'NAMES';
quit;
This produces a space delimited macro vaiable with the desired names.
Step 2 To build the empty data set then this code will work:
Data New ;
length &name_list ;
run ;
You can avoid assuming lengths or create populated dataset with new variable names by using a slightly more complicated select statement.
For example
select compress(Name)||"_Ref $")||compress(put(length,best.))
into :name_list
separated by ' '
will generate a macro variable which retains the previous length for each variable. This will work with no changes to step 2 above.
To create populated data set for use with rename dataset option, replace the select statement as follows:
select compress(Name)||"= "||compress(_Ref")
into :name_list
separated by ' '
Then replace the Step 2 code with the following:
Data New ;
set names (rename = ( &name_list)) ;
run ;