hi I just wanted to know what would be the code in a data step for this outcome
I have the following:
and need the below:
please direct me to another page if this was answered previously...in addition how would I asked something like this in the future.
Thanks
Do you really need to do this in a data step? You might as well use proc sql for a Cartesian product.
proc sql;
create table want as
select code, groupds
from (select distinct code from have) a,
(select distinct groupds from have) b;
quit;
Here is how you can do it in a data step.
data want;
set have(keep=code where=(not missing(code)));
do i=1 to n;
set have(keep=groupds where=(not missing(groupds))) point=i nobs=n;
output;
end;
run;
The issue with this method is if you have a duplicate code or groupds a record will be created for the duplicate entry.
Related
Is there an equivalent to the SAS format cntlin procedure in Teradata. I have a reference value table (code_value), which is used a lot and rather than doing many outer joins to the reference value table, I'd like to have a lookup function similar to the solution below in SAS. Any help is greatly appreciated.
data CodeValueFormat;
set grp.code_value (keep=code_value_id description);
fmtname = 'fmtCodeValue';
start = code_value_id;
label = description;
run;
proc format cntlin=work.codevalueformat;
run;
proc sql;
select foo_code_id format=fmtCodeValue.
from bar;
quit;
There is no way you can emulate SAS format cntlin procedure in Teradata or any other database other than using lookup tables. One way to avoid doing same joins again and again is to do index join. please look into below link to see whether this is what you want to do. https://info.teradata.com/HTMLPubs/DB_TTU_16_00/index.html#page/Database_Management%2FB035-1094-160K%2Fqiq1472240587768.html%23wwID0EFK1R
One another way is to maintain a denormalized table and do joins with your incremental/daily records in your staging area and then append this records to your final table
I have 12 columns and I want to add them through sql. I have tried:
proc sql;
select*,sum(a1-a12) as total
from tablename;
quit;
However this isn't working. Is there an alternative or can we use single and double hash only in Data steps.
If you want to add values in the same observation then you need to use SAS function sum(,...) and not the SQL aggregate function sum(). You current code looks like the later since it only has one value listed, the difference between variables A1 and A12. This is because PROC SQL does not recognize variable lists. You will need to list all of your variables.
select *,sum(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12) as total
from have
;
If you want this in SQL because you're making use of other SQL functionality in addition to this, make a view.
data have_v/view=have_v;
set have;
total = sum(of a1-a12);
run;
proc sql;
select * from have_v; *presumably you do other things here;
quit;
In some cases you do not know how many variables there are or you don't want to hard code it. The syntax is in this case: sum(of < variable>:);
data test;
a1=1;
a2=2;
/*number 3 is missing*/
a4=4;
a5=5;
run;
data test2;
set test;
sum_of_all_As= sum(of a:);
run;
For more tips and tricks see: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000245953.htm
I'm new to SAS and have some problems with adding a column to existing data set in SAS using MODIFY statement (without proc sql).
Let's say I have data like this
id name salary perks
1 John 2000 50
2 Mary 3000 120
What I need to get is a new column with the sum of salary and perks.
I tried to do it this way
data data1;
modify data1;
money=salary+perks;
run;
but apparently it doesn't work.
I would be grateful for any help!
As #Tom mentioned you use SET to access the dataset.
I generally don't recommend programming this way with the same name in set and data statements, especially as you're learning SAS. This is because it's harder to detect errors, since once run and encounter an error, you destroy your original dataset and have to recreate it before you start again.
If you want to work step by step, consider intermediary datasets and then clean up after you're done by using proc datasets to delete any unnecessary intermediary datasets. Use a naming conventions to be able to drop them all at once, i.e. data1, data2, data3 can be referenced as data1-data3 or data:.
data data2;
set data1;
money = salary + perks;
run;
You do now have two datasets but it's easy to drop datasets later on and you can now run your code in sections rather than running all at once.
Here's how you would drop intermediary datasets
proc datasets library=work nodetails holist;
delete data1-data3;
run;quit;
You can't add a column to an existing dataset. You can make a new dataset with the same name.
data data1;
set data1;
money=salary+perks;
run;
SAS will build it as a new physical file (with a temporary name) and when the step finishes without error it deletes the original and renames the new one.
If you want to use a data set you do it like this:
data dataset;
set dataset;
format new_column $12;
new_column = 'xxx';
run;
Or use Proc SQL and ALTER TABLE.
proc sql;
alter table dataset
add new_column char(8) format = $12.
;
quit;
I have a data where I have various types of loan descriptions, there are at least 100 of them.
I have to categorise them into various buckets using if and then function. Please have a look at the data for reference
data des;
set desc;
if loan_desc in ('home_loan','auto_loan')then product_summary ='Loan';
if loan_desc in ('Multi') then product_summary='Multi options';
run;
For illustration I have shown it just for two loan description, but i have around 1000 of different loan_descr that I need to categorise into different buckets.
How can I categorise these loan descriptions in different buckets without writing the product summary and the loan_desc again and again in the code which is making it very lengthy and time consuming
Please help!
Another option for categorizing is using a format. This example uses a manual statement, but you can also create a format from a dataset if you have the to/from values in a dataset. As indicated by #Tom this allows you to change only the table and the code stays the same for future changes.
One note regarding your current code, you're using If/Then rather than If/ElseIf. You should use If/ElseIf because then it terminates as soon as one condition is met, rather than running through all options.
proc format;
value $ loan_fmt
'home_loan', 'auto_loan' = 'Loan'
'Multi' = 'Multi options';
run;
data want;
set have;
loan_desc = put(loan, $loan_fmt.);
run;
For a mapping exercise like this, the best technique is to use a mapping table. This is so the mappings can be changed without changing code, among other reasons.
A simple example is shown below:
/* create test data */
data desc (drop=x);
do x=1 to 3;
loan_desc='home_loan'; output;
loan_desc='auto_loan'; output;
loan_desc='Multi'; output;
loan_desc=''; output;
end;
data map;
loan_desc='home_loan'; product_summary ='Loan '; output;
loan_desc='auto_loan'; product_summary ='Loan'; output;
loan_desc='Multi'; product_summary='Multi options'; output;
run;
/* perform join */
proc sql;
create table des as
select a.*
,coalescec(b.product_summary,'UNMAPPED') as product_summary
from desc a
left join map b
on a.loan_desc=b.loan_desc;
There is no need to use the macro language for this task (I have updated the question tag accordingly).
Already good solutions have been proposed (I like #Reeza's proc format solution), but here's another route which also minimizes coding.
Generate sample data
data have;
loan_desc="home_loan"; output;
loan_desc="auto_loan"; output;
loan_desc="Multi"; output;
loan_desc=""; output;
run;
Using PROC SQL's case expression
This way doesn't allow, to my knowledge, having several criteria on a single when line, but it really simplifies coding since the resulting variable's name needs to be written down only once.
proc sql;
create table want as
select
loan_desc,
case loan_desc
when "home_loan" then "Loan"
when "auto_loan" then "Loan"
when "Multi" then "Multi options"
else "Unknown"
end as product_summary
from have;
quit;
Otherwise, using the following syntax is also possible, giving the same results:
proc sql;
create table want as
select
loan_desc,
case
when loan_desc in ("home_loan", "auto_loan") then "Loan"
when loan_desc = "Multi" then "Multi options"
else "Unknown"
end as product_summary
from have;
quit;
I need to perform a procedure on a small set (e.g. 100 rows) of a very big table just to test the syntax and output. I have been running the following code for a while and it's still running. I wonder if it is doing something else. Or what is the right way to do?
Proc sql inobs = 100;
select
Var1,
sum(Var2) as VarSum
from BigTable
Group by
Var1;
Quit;
What you're doing is fine (limiting the maximum number of records taken from any table to 100), but there are a few alternatives. To avoid any execution at all, use the noexec option:
proc sql noexec;
select * from sashelp.class;
quit;
To restrict the obs from a specific dataset, you can use the data set obs option, e.g.
proc sql;
select * from sashelp.class(obs = 5);
quit;
To get a better idea of what SAS is doing behind the scenes in terms of index usage and query planning, use the _method and _tree options (and optionally combine with inobs as above):
proc sql _method _tree inobs = 5;
create table test as select * from sashelp.class
group by sex
having age = max(age);
quit;
These produce quite verbose output which is beyond the scope of this answer to explain fully, but you can easily search for more details if you want.
For further details on debugging SQL in SAS, refer to
http://support.sas.com/documentation/cdl/en/sqlproc/62086/HTML/default/viewer.htm#a001360938.htm