How to insert records from other tables to a table with more columns in sas - sas

I have two sas tables, A and B, A has two columns (i.e., columna columnb) and table B has four columns (i.e., columna columnb columnc columnd ), I wish to insert records from table A to table B, I tried the following, but it shows me errors:
PROC SQL;
insert into B
select *, columnc='a', columnd='b' from A;
QUIT;

Assuming you just want to leave the extra columns empty then don't include them in the insert. It is much easier to just use SAS code instead of SQL code.
proc append base=b data=a force nowarn;
run;
For the SQL Insert statement you need to specify which columns in the target table you are writing into, otherwise it assumes you will specify values for all of them.
insert into B (columna,columnb)
select columna,columnb
from A
;
If instead you want to fill the extra columns with constants then include the constants in the SELECT list.
insert into B (columna,columnb,columnc,columnd)
select columna,columnb,'a','b'
from A
;
If you are positive that you are providing the values in the right order then you can leave the column names off of the target table specification.
insert into B
select *,'a','b'
from A
;

You can't specify the variable name that way; in fact, you can't specify the variable at all using insert into. See this example:
proc sql;
create table class like sashelp.class;
alter table class
add rownum numeric;
alter table class
add othcol numeric;
insert into class
select *, 1 as othcol, monotonic() as rownum from sashelp.class;
quit;
Here I use as to specify the column name, but notice that it doesn't actually work: it puts 1 in the rownum column, and the monotonic() value in othcol, since they're in that order on the table.

Related

SAS - Keep only columns listed in a separate dataset

I have two datasets. The first, big_dataset, has around 3000 columns, most of which are never used. The second, column_list, contains a single column called column_name with around 100 values. Each value is the name of a column I want to keep.
I want to filter big_dataset so that only columns in column_list are kept, and the rest are discarded.
If I were using Pandas dataframes in Python, this would be a trivial task:
cols = column_list['column_name'].tolist()
smaller_dataset = big_dataset[cols]
However, I can't figure out the SAS equivalent. Proc Transpose doesn't let me turn the rows into headers. I can't figure out a statement in the data step that would let this work, and as far as I'm aware this isn't something that Proc SQL could handle. I've read through the docs on Proc Datasets and that doesn't seem to have what I need either.
To obtain a list of columns from column_list to use against big_dataset, you can query the column_list table and put the result into a macro variable. This can be achieved with PROC SQL and the SEPARATED BY clause:
proc sql noprint;
select column_name
into :cols separated by ','
from column_list;
create table SMALLER_DATASET AS
select &cols.
from WORK.BIG_DATASET;
quit;
Alternatively you may use SEPARATED BY ' ' and then use the resulting list in a KEEP statement or dataset option:
proc sql noprint;
select column_name
into :cols separated by ' '
from column_list;
quit;
data small_dataset;
set big_dataset (keep=&cols.);
/* or keep=&cols.; */
run;

Using proc sql, Where at least 1 column is a value

How do i write in sas:
proc sql;
create table THIS as
select *
from MAIN(keep=id col1 -- col34)
where (AT LEAST ONE OF THE COLUMNS contains 1) ;
;
I am having a problem figuring out how to write that last line bc I want to keep all columns so I am not just checking one column i want to check for all of them.
You will have more flexibility if you use a DATA step instead of PROC SQL since you cannot use variable lists in PROC SQL code.
Assuming all of the variables in your list are numeric you could do something like this.
data this;
set main ;
keep id col1 -- col34;
if whichn(1,of col1 -- col34);
run;
Tom is right, the best approach is with a data step. If you are certain you want to do it with SQL though you could do something like this:
proc sql noprint;
create table THIS as
select *
from MAIN(keep=id col1 -- col34)
where sum(col1,col2,col3, ... ,col34)
;
quit;

select only a few columns from a large table in SAS

I have to join 2 tables on a key (say XYZ). I have to update one single column in table A using a coalesce function. Coalesce(a.status_cd, b.status_cd).
TABLE A:
contains some 100 columns. KEY Columns ABC.
TABLE B:
Contains just 2 columns. KEY Column ABC and status_cd
TABLE A, which I use in this left join query is having more than 100 columns. Is there a way to use a.* followed by this coalesce function in my PROC SQL without creating a new column from the PROC SQL; CREATE TABLE AS ... step?
Thanks in advance.
You can take advantage of dataset options to make it so you can use wildcards in the select statement. Note that the order of the columns could change doing this.
proc sql ;
create table want as
select a.*
, coalesce(a.old_status,b.status_cd) as status_cd
from tableA(rename=(status_cd=old_status)) a
left join tableB b
on a.abc = b.abc
;
quit;
I eventually found a fairly simple way of doing this in proc sql after working through several more complex approaches:
proc sql noprint;
update master a
set status_cd= coalesce(status_cd,
(select status_cd
from transaction b
where a.key= b.key))
where exists (select 1
from transaction b
where a.ABC = b.ABC);
quit;
This will update just the one column you're interested in and will only update it for rows with key values that match in the transaction dataset.
Earlier attempts:
The most obvious bit of more general SQL syntax would seem to be the update...set...from...where pattern as used in the top few answers to this question. However, this syntax is not currently supported - the documentation for the SQL update statement only allows for a where clause, not a from clause.
If you are running a pass-through query to another database that does support this syntax, it might still be a viable option.
Alternatively, there is a way to do this within SAS via a data step, provided that the master dataset is indexed on your key variable:
/*Create indexed master dataset with some missing values*/
data master(index = (name));
set sashelp.class;
if _n_ <= 5 then call missing(weight);
run;
/*Create transaction dataset with some missing values*/
data transaction;
set sashelp.class(obs = 10 keep = name weight);
if _n_ > 5 then call missing(weight);
run;
data master;
set transaction;
t_weight = weight;
modify master key = name;
if _IORC_ = 0 then do;
weight = coalesce(weight, t_weight);
replace;
end;
/*Suppress log messages if there are key values in transaction but not master*/
else _ERROR_ = 0;
run;
A standard warning relating to the the modify statement: if this data step is interrupted then the master dataset may be irreparably damaged, so make sure you have a backup first.
In this case I've assumed that the key variable is unique - a slightly more complex data step is needed if it isn't.
Another way to work around the lack of a from clause in the proc sql update statement would be to set up a format merge, e.g.
data v_format_def /view = v_format_def;
set transaction(rename = (name = start weight = label));
retain fmtname 'key' type 'i';
end = start;
run;
proc format cntlin = v_format_def; run;
proc sql noprint;
update master
set weight = coalesce(weight,input(name,key.))
where master.name in (select name from transaction);
run;
In this scenario I've used type = 'i' in the format definition to create a numeric informat, which proc sql uses convert the character variable name to the numeric variable weight. Depending on whether your key and status_cd columns are character or numeric you may need to do this slightly differently.
This approach effectively loads the entire transaction dataset into memory when using the format, which might be a problem if you have a very large transaction dataset. The data step approach should hardly use any memory as it only has to load 1 row at a time.

How to change the column headers of a sas dataset into an observation?

I have created a sas code which generates many sas datasets. Now I want to append all of them to a single excel file . So first I want to convert all the column headers of sas datasets as first observation. Then leave space between these datasets (adding a blank observation). How can we do it?
one way to do this would be to use dictionary.columns
proc sql;
create table Attribute as
select * from dictionary.columns;
Read through the table and check what attributes you are interested in. For your case you might be interested in the column "NAME" <- consist of the name of all columns.
Modify the table by adding where statement to the proc sql based on the identity of the column ( from which library / what type of file / name of file) e.g. where upcase(libname)= "WORK"
data attribute;
array column [ n ] $ length ;
do i=1 to n;
set attribute ( keep = name) ;
column [ i ] = name ;
end;
run;
Then I would proceed with data step. You could use macro variable to store the value of column's names by select variable into : but anyhow you still need to hardcode the size for the array n or any other method that store value into one observation . Also remember define the length and the type of array accordingly. You can give name to the variable in the result dataset Attribute by adding var1-varnafter the length at array statement.
For simplicity I use set statement to read observation one and one and store the value of column NAME, which is the official column name derived when using dictionary.columns into the array
Note that creating a non-temporary array would create variable(s) .
Add if you want to add the blank,
data younameit ;
merge attribute attribute(firstobs=2 keep=name rename=(name=_name));
output;
if name ne _name then do;
call missing(of _all_);
output;
end;
run;
As two datasets start with different observation and column names do not duplicate within one dataset, the next row of a valid observation ( derived from the first output statement in the resulting dataset would be empty due to call missing ( of _all_ ) ; output;
Sounds like you just want to combine the datasets and write the results to the Excel file. Do you really need the extra empty row?
libname out xlsx 'myfile.xlsx';
data out.report ;
set ds1 ds2 ...;
run;
Ensure that all your columns are character (or numeric, substitute numeric), then in your data step use:
array names{*} _character_;
do i=1 to dim(names);
call label(names{i}, names{i});
end;
output;

Update the values of a column in a dataset with another table

If I have Table A with two columns: ID and Mean, and Table B with a long list of columns including Mean, how can I replace the values of the Mean column in Table B with the IDs that exist in Table A?
I've tried PROC SQL UPDATE and both DATASET MERGE and DATASET UPDATE but they keep adding rows when the number of columns is not equal in both tables.
data want;
merge have1(in=H1) have2(in=H2);
by mergevar;
if H1;
run;
That will guarantee that H2 does not add any rows, unless there are duplicate values for one of the by values. Other conditions can be used as well; if h2; would do about the same thing for the right-hand dataset, and if h1 and h2; would only keep records that come from both tables.
PROC SQL join should also work fairly easily.
proc sql;
create table want as
select A.id, coalesce(B.mean, A.mean)
from A left join B
on A.id=B.id;
quit;