I am trying to create a meta data table for all the tables in the "xyzfolder" using the proc contents function (see below for more details). When the table "all" is created, I noticed that the "file size" column is missing. As in for table ABC in xyzfolder, I would like to look at a column "file size" to see if table ABC is greater than 1 GB.
proc contents data=xyzfolder._all_ out =work.all;
run;
Best way to do this is to use the dictionary tables. You can do that in SQL like so, or in regular SAS using sashelp.vtable.
proc sql;
create table all as
select * from dictionary.tables
where libname='SASHELP';
quit;
If filesize is known to SAS it will be listed here in the column filesize. If it's not known to SAS, you will need to provide some more information as to your operating system, the location and type of the libname, etc. to get useful responses.
Related
As there are many datasets contained in one library. How can I use SAS code to find out which dataset has the largest number of cases?
Suppose the library name is "SASHELP".
Thank you!
The SQL dictionary.* family of tables gives access to all sorts of metadata. Be careful, some dictionary requests can cause a lot of activity as it collects the information requested.
From Docs:
How to View DICTIONARY Tables
…
DICTIONARY Tables and Performance
When you query a DICTIONARY table, SAS gathers information that is pertinent to that table. Depending on the DICTIONARY table that is being queried, this process can include searching libraries, opening tables, and executing SAS views. Unlike other SAS procedures and the DATA step, PROC SQL can improve this process by optimizing the query before the select process is launched. Therefore, although it is possible to access DICTIONARY table information with SAS procedures or the DATA step by using the SASHELP views, it is often more efficient to use PROC SQL instead.
…
Note: SAS does not maintain DICTIONARY table information between queries. Each query of a DICTIONARY table launches a new discovery process.
Example:
* Use dictionary.tables to get the names of the tables with the 10 most rowcount;
proc sql;
reset outobs=10;
create table top_10_datasets_by_rowcount as
select libname, memname, nobs
from dictionary.tables
where libname = 'SASHELP'
and memtype = 'DATA'
order by nobs descending
;
reset outobs=max;
Is there an equivalent to the SAS format cntlin procedure in Teradata. I have a reference value table (code_value), which is used a lot and rather than doing many outer joins to the reference value table, I'd like to have a lookup function similar to the solution below in SAS. Any help is greatly appreciated.
data CodeValueFormat;
set grp.code_value (keep=code_value_id description);
fmtname = 'fmtCodeValue';
start = code_value_id;
label = description;
run;
proc format cntlin=work.codevalueformat;
run;
proc sql;
select foo_code_id format=fmtCodeValue.
from bar;
quit;
There is no way you can emulate SAS format cntlin procedure in Teradata or any other database other than using lookup tables. One way to avoid doing same joins again and again is to do index join. please look into below link to see whether this is what you want to do. https://info.teradata.com/HTMLPubs/DB_TTU_16_00/index.html#page/Database_Management%2FB035-1094-160K%2Fqiq1472240587768.html%23wwID0EFK1R
One another way is to maintain a denormalized table and do joins with your incremental/daily records in your staging area and then append this records to your final table
I am working in SAS Enterprise guide and have a one column SAS table that contains unique identifiers (id_list).
I want to filter another SAS table to contain only observations that can be found in id_list.
My code so far is:
proc sql noprint;
CREATE TABLE test AS
SELECT *
FROM data_sample
WHERE id IN id_list
quit;
This code gives me the following errors:
Error 22-322: Syntax error, expecting on of the following: (, SELECT.
What am I doing wrong?
Thanks up front for the help.
You can't just give it the table name. You need to make a subquery that includes what variable you want it to read from ID_LIST.
CREATE TABLE test AS
SELECT *
FROM data_sample
WHERE id IN (select id from id_list)
;
You could use a join in proc sql but probably simpler to use a merge in a data step with an in= statement.
data want;
merge oneColData(in = A) otherData(in = B);
by id_list;
if A;
run;
You merge the two datasets together, and then using if A you only take the ID's that appear in the single column dataset. For this to work you have to merge on id_list which must be in both datasets, and both datasets must be sorted by id_list.
The problem with using a Data Step instead of a PROC SQL is that for the Data step the Data-set must be sorted on the variable used for the merge. If this is not yet the case, the complete Data-set must be sorted first.
If I have a very large SAS Data-set, which is not sorted on the variable to be merged, I have to sort it first (which can take quite some time). If I use the subquery in PROC SQL, I can read the Data-set selectively, so no sort is needed.
My bet is that PROC SQL is much faster for large Data-sets from which you want only a small subset.
I use sas manipulate some database tables. There is a column "CDR$UNPIVOT$SKEY" in one table. I can't direct use this column name in data step or Proc SQL, even i change the option of VALIDVARNAME=ANY.
Is there a way to direct use such column name?
Thank you!
Yep. You are very close now. All you need is to address it using its literal format.
option validvarname=any;
data want;
'CDR$UNPIVOT$SKEY'n = 'I can see now';
run;
proc print data=want;run;
I have a SAS dataset with numeric variables to, from, and weight. Some of the observations have value 0 for weight. I need all the weight values to be positive, so I wish to simply add 1 to all weight values.
How can I do that using Proc SQL?
I have tried the following, but it doesn't work:
proc sql;
update mylib.mydata
set weight=weight+1;
quit;
The error is:
ERROR: A CURRENT-OF-CURSOR operation cannot be initiated because
the column "weight" cannot be used to uniquely identify a row
because of its data type.
Also, mylib refers to a Greenplum appliance. This might be the problem...
If you have the database permissions to update that table, you might want to use the SAS/Access pass-through facility. You will need to know the correct syntax for this to work. Here is a non-working example:
proc sql;
connect to greenplm as dbcon
(server=greenplum04 db=sample port=5432 user=gpusr1 password=gppwd1);
execute (
/* Native code goes here */
update sample.mydata
set weight=weight+1
) by dbcon;
quit;
The connection string would be the same as used on the LIBNAME that defined your "mylib' libref.
However, if you are really trying to create a SAS dataset (not update the real table), you can do that with a simple data step:
data mydata;
set mylib.mydata
weight = weight + 1;
run;
That will create a copy of the table that can be used with other SAS procedures.
Check out this note at prosgress.com. You probably need to add UPDATE_MULT_ROWS=YES to your library definition.