I want to copy only 2 out of 7 columns in 'B' dataset form 'A' dataset
dataset A has (p,q,r,s,t,u,v)
I want to copy p,q,t in a new dataset B.
This is a more efficient way to do it:
data B;
set A (keep=p q t);
run;
Because the keep option in the set statement indicates that only these columns are read to start with. Using keep outside the set statement will still read in the columns, but drop them after.
We can use 'keep' keyword.
data B;
set A;
keep p q t;
run;
What do you mean copy two colums ? Do dataset B already exists ? If thats the case you need to simply merge the two files and use keep statemaent when reading them. If you need to create new data set its even simpler
data B;
set A;
keep p q t;
run;
Hope it helps. If you need the merge plz post and I will explain furthermore
Related
I have an Excel file that always has the same name but the contents of the table changes. I am looking to write a code that names the table based on the value in one of the cells.
For example:
If cell A3 equals "Employment Information", I want the table to be named "Jobs".
If cell A3 equals "Inflation Information", I want the table to be named "Currency".
etc.
I want to define ONE macro (i.e. %table(filename,cell)), or ONE loop of if then else statements to achieve this. Unfortunately, I can't seem to wrap my head around this logically. If someone with experience in SAS could help me out that would be awesome. I will edit my question soon to include some codes that I have already tried but which have failed to get the job done.
You need to read the data to find the content. You could then create a macro variable to make it easy to rename the dataset using PROC DATASETS.
Let's assume you have converted the Excel sheet into a dataset named WORK.HAVE. Let's also assume that you know what variable contains the data from column A, let's call that variable A. Is there anything in the data that makes it possible to tell which observation is the one to use? For now let's just assume that by A3 you mean the second observation since the first row of the sheet should have the variable names.
So in that case you want something like this:
%let newname=have;
data _null_;
set have (firstobs=2);
if A="Employment Information" then call symputx('newname','Jobs');
else if A="Inflation Information" then call symputx('newname','Currency');
stop;
run;
proc datasets nolist lib=work;
change have=&newname;
run;
quit;
So I'm working with a data set that has millions of rows. I'm trying to cut down the number of rows, so that I can merge this data set and another data set by zipcode.
What I'm trying to do is take a specific column "X6" and search through it for the value of "357". Then every row that has that value I want to move into a new data set.
I'm assuming that I'm going to have to use some form of if/then statement, but I can't get anything to work successfully. If needed I can post a snapshot of some of my data or what SAS code I currently have. I've seen other things that are similar, but none of them involve SAS.
Thanks for all of your help in advanced.
RamB gave a great way to parse into two datasets.
If you just want a new dataset that is a subset of the original, the following will work well
DATA NEW;
SET ORIGINAL;
IF X6="357"; *NOTE: THIS ASSUMES X6 IS DEFINED AS CHARACTER*
RUN;
A nice function can also parse multiple criteria. Say you wanted to keep records where X6 = 357 or 588.
DATA NEW;
SET ORIGINAL;
IF X6 IN("357","588"); *NOTE: THIS ASSUMES X6 IS DEFINED AS CHARACTER*
RUN;
Lastly, the NOTIN also works to exclude.
With data step this is really simple. I'll give you an example.
data dataset_with_357
original_without_357;
set original_dataset;
if compress(x6) = "357" then output dataset_with_357;
else output original_without_357;
run;
As I said, there are several ways of doing this, and it wasn't clear for me which is better for you.
Just use Proc SQL to create your data set, then reference the value your looking for in your query -
Proc SQL;
Create table new as
Select *
From dataset
Where x6 = 357
;
Quit;
Assuming your x6 variable is numeric...
On a mobile device...sorry for no code text
Is there a way in sas to create disjoint data sets using the SET statement?
I have tried:
DATA OnlyFirst OnlySecond InBoth;
SET firstds(IN=A)
seconds(IN=B);
IF A AND NOT B THEN OUTPUT OnlyFirst;
IF B AND NOT A THEN OUTPUT OnlySecond;
IF A AND B THEN OUTPUT InBoth;
Run;
But this does not create disjoint sets.
That's not how the set statement works. You should be able to use a merge if you first make sure firstds and seconds are both sorted by a key variable (or variables) they both share. You'd then need to reference that shared variable in a by statement.
DATA OnlyFirst OnlySecond InBoth;
merge firstds(IN=A)
seconds(IN=B);
by <something shared variable>;
IF A AND NOT B THEN OUTPUT OnlyFirst;
IF B AND NOT A THEN OUTPUT OnlySecond;
IF A AND B THEN OUTPUT InBoth;
Run;
I have written a macro to use proc univariate to calculate custom quantiles for variables in a dataset (say dsn1) %cust_quants(dsn= , varlist= , quant_list= ). The output is a summary dataset (say dsn2)that looks something like the following:
q_1 q_2.5 q_50 q_80 q_97.5 q_99 var_name
1 2.5 50 80 97.5 99 ex_var_1_100
-2 10 25 150 500 20000 ex_var_pos_skew
-20000 -500 -150 0 10 50 ex_var_neg_skew
What I would like to do is to use the summary dataset to cap/floor extreme values in the original dataset. My idea is to extract the column of interest (say q_99) and put it into a vector of macro-variables (say q_99_1, q_99_2, ..., q_99_n). I can then do something like the following:
/* create summary of dsn1 as above example */
%cust_quants(dsn= dsn1, varlist= ex_var_1_100 ex_var_pos_skew ex_var_neg_skew,
quant_list= 1 2.5 50 80 97.5 99);
/* cap dsn1 var's at 99th percentile */
data dsn1_cap;
set dsn1;
if ex_var_1_100 > &q_99_1 then ex_var_1_100 = &q_99_1;
if ex_var_pos_skew > &q_99_2 then ex_var_pos_skew = &q_99_2;
/* don't cap neg skew */
run;
In R, it is very easy to do this. One can extract sub-data from a data-frame using matrix like indexing and assign this sub-data to an object. This second object can then be referenced later. R example--extracting b from data-frame a:
> a <- as.data.frame(cbind(c(1,2,3), c(4,5,6)))
> print(a)
V1 V2
1 1 4
2 2 5
3 3 6
> a[, 2]
[1] 4 5 6
> b <- a[, 2]
> b[1]
[1] 4
Is it possible to do the same thing in SAS? I want to be able to assign a column(s) of sub-data to a macro variable / array, such that I can then use the macro / array within a 2nd data step. One thought is proc sql into::
proc sql noprint;
select v2 into :v2_macro separated by " "
from a;
run;
However, this creates a single string variable when what I really want is a vector of variables (or array--no vectors in SAS). Another thought is to add %scan (assuming this is inside a macro):
proc sql noprint;
select v2 into :v2_macro separated by " "
from a;
run;
%let i = 1;
%do %until(%scan(&v2_macro, &i) = "");
%let var_&i = %scan(&v2_macro, &i);
%let &i = %eval(&i + 1);
%end;
This seems inefficient and takes a lot of code. It also requires the programmer to remember which var_&i corresponds to each future purpose. Is there a simpler / cleaner way to do this?
**Please let me know in the comments if this is enough background / example. I'm happy to give a more complete description of why I'm doing what I'm attempting if needed.
First off, I assume you are talking about SAS/Base not SAS/IML; SAS/IML is essentially similar to R and has the same kind of operations available in the same manner.
SAS/Base is more similar to a database language than a matrix language (though has some elements of both, and some elements of an OOP language, as well as being a full-featured functional programming language).
As a result, you do things somewhat differently in order to achieve the same goal. Additionally, because of the cost of moving data in a large data table, you are given multiple methods to achieve the same result; you can choose the appropriate method for the required situation.
To begin with, you generally should not store data in a macro variable in the manner you suggest. It is bad programming practice, and it is inefficient (as you have already noticed). SAS Datasets exist to store data; SAS macro variables exist to help simplify your programming tasks and drive the code.
Creating the dataset "b" as above is trivial in Base SAS:
data b;
set a;
keep v2;
run;
That creates a new dataset with the same rows as A, but only the second column. KEEP and DROP allow you to control which columns are in the dataset.
However, there would be very little point in this dataset, unless you were planning on modifying the data; after all, it contains the same information as A, just less. So for example, if you wanted to merge V2 into another dataset, rather than creating b, you could simply use a dataset option with A:
data c;
merge z a(keep=v2);
by id;
run;
(Note: I presuppose an ID variable of some form to combine A and Z.)
This merge combines the v2 column onto z, in a new dataset, c. This is equivalent to vertically concatenating two matrices (although a straight-up concatenation would remove the 'by id;' requirement, in databases you do not typically do that, as order is not guaranteed to be what you expect).
If you plan on using b to do something else, how you create and/or use it depends on that usage. You can create a format, which is a mapping of values [ie, 1='Hello' 2='Goodbye'] and thus allows you to convert one value to another with a single programming statement. You can load it into a hash table. You can transpose it into a row (proc transpose). Supply more detail and a more specific answer can be provided.
I need to change the variable length in a existing dataset. I can change the format and informat but not the length. I get an error. The documentation says this is possible but there are no examples.
Here is my issue. My data source could change so I don't want to pre define columns on import. I want to do a generic import and then look for certain columns and adjust the length.
I have tried PROC SQL and DATA steps. It looks like the only way to do this is to recreate the dataset or the column. Which I don't want to do.
Thanks,
Donnie
If you put your LENGTH statement before the SET statement, in a Data step, you can change the length of a variable. Obviously, you will get truncation if you have data longer than your new length.
However, using a DATA step to change the length is also re-creating the data set, so I'm confused by that part of your question.
The only way to change the length of a variable in a datastep is to define it before a source (SET) dataset is read in.
Conversely you can use an alter statement in a proc sql. SAS support alter statement
Length of a variable remains same once you set the dataset. Add length statements before you set the dataset if you need to change length of a columns
data a;
length a, b, c $200 ;
set b ;
run ;