I am doing a Proc Freq on a a large amount of User Entered Data, I would like to know if I can Combine the Results Rows based on the Contents of the first column.
You appear to want to perform a frequency of the first word (or 1st scanned part of a column). Such a case will require data manipulation to reduce the longer value to the desired shortened value, in a different variable, to be frequency binned.
data have;
input;
user_entered_data = _infile_;
datalines;
Nyfaria - January
Nyfaria - Febuary
Michelangelo - January
Michelangelo - Feburary
run;
data have_for_freq;
set have;
item = scan (user_entered_data,1,' ');
run;
options nocenter;
ods noproctitle;
proc freq data=have_for_freq;
title "Freq of raw data";
table user_entered_data;
run;
proc freq data=have_for_freq;
title "Freq of raw data formatted as $4.";
table user_entered_data;
format user_entered_data $4.;
run;
proc freq data=have_for_freq;
title "Freq of raw data - item scanned out";
table item;
run;
Note: In some cases you can use a format to control the mapping of a raw value to a reported value. There is no format that returns the first 'word' of a value (such as scan does)
Related
I have 8 tables, all containing the same order and number of columns, while one specific column named ATTRIBUTE contains different data which is of length 4 to 25. When I use PROC SQL and UNION ALL tables, the ATTRIBUTE column data length in minimizes to the lowest (4 digits).
How do I solve that i.e keeping full length of the data ?
Example, per #Lee
data have1;
attrib name length=$10 format=$10.;
name = "Anton Short";
run;
data have2;
attrib name length=$50 format=$50.;
name = "Pippy Longstocking of Stoyville";
run;
* column attributes such as format, informat and label of the selected columns
* in the result set are 'inherited' on a first found first kept order, dependent on
* the SQL join plan (i.e. the order of the tables as coded for the query);
proc sql;
create table want as
select name from have1 union
select name from have2
;
proc contents data=want varnum;
run;
Format is shorted than Length, any output display of longer values will appear to have been truncated at the data level.
* attributes of columns can be reset,
* (cleared so as to be dealt with in default manners),
* without rewriting the entire data set;
proc datasets nolist lib=work;
modify want;
attrib name format=; * format= removes the format of a variable;
run;
proc contents data=want varnum;
run;
My question is about the append of two different tables that are supposed to have the same name/format/type/length variables.
I am trying to create a step in my SAS program where I don't allow my program to be executed if the format/type/length of variables with the same name is not the same.
For example, when in one table I have a date in type string "dd-mm-yyyy" and in the other table I have the "yyyy-mm-dd" or "dd-mm-yyyy hh:mm:ss". After the append, our daily executions based on these input tables didn't work as expected. Sometimes the values come up as missing or out of order, since the formats are different.
I tried using the PROC COMPARE statement, which allowed me to check which variables have Differing Attributes (Type, Length, Format, InFormat and Labels).
proc compare base = SAS-data-set
compare = SAS-data-set;
run;
However, I only got the info on which variables have differing atributes (listing of common variables with differing attributes), not being able to do anything with/about it.
On the other hand, I would like to know if there's a chance to have a structured output table with this information, in order to use it as a control statement.
Creating an automatic task to do it would save me a lot of time.
Screenshot of an example:
You can use Proc CONTENTS to get information about a data sets variables. Do that for both data sets, and then you can use Proc COMPARE to create a data set informing you of the variable attributes differences.
data cars1;
set sashelp.cars (obs=10);
date = today ();
format date date9.;
cars1_only = 1;
x = 1.458; label x = "x-factor";
run;
data cars2;
length type $50;
set sashelp.cars (obs=10);
format date yymmdd10.;
cars2_only = 1;
X = 1.548; label x = "X factor to apply";
run;
proc contents noprint data=cars1 out=cars1_contents;
proc contents noprint data=cars2 out=cars2_contents;
run;
data cars1_contents;
set cars1_contents;
upName = upcase(Name);
run;
data cars2_contents;
set cars2_contents;
upName = upcase(Name);
run;
proc sort data=cars1_contents; by upName;
proc sort data=cars2_contents; by upName;
run;
proc compare noprint
base=cars1_contents
compare=cars2_contents
outall
out=cars_contents_compare (where=(_TYPE_ ne 'PERCENT'))
;
by upName;
run;
There is also an ODS table you can capture directly without having to run Proc CONTENTS, but the capture is not 'data-rific'
ods output CompareVariables=work.cars_vars;
proc compare base=cars1 compare=cars2;
run;
I have 2 variables and 3 records in a sas data set, and based on the date field in that data set, I need to read different monthly data sets.
For example,
I have
item no. Date
1 30Jun2015
2 31Jul2015
3 31Aug2015
When I read the first record, then based on the date field (30jun2015) here, it should merge another dataset suffixed with 30jun2015 with this current dataset.
How can I achieve that?
So as I'll hazard a guess what you're looking for I've left a bit of a gap where you'll have to specifiy the criteria for your own merge.
1) Read in base data
data MAIN_DATA;
infile cards;
input ITEM_NO DATE:date9.;
format DATE date9.;
cards;
1 30JUN2015
2 31JUL2015
3 31AUG2015
;
run;
2) Store all dates: into macro variables date1 to daten. Assuming ddmmyy6. is a good format for your table names
Data _null_;
Set Main_data;
Call symputx('date'||strip(_n_),put(DATE,ddmmyy6.));
Call symputx('daten', _n_);
Run;
3) Read in the variables and read the associated table - you haven't specified how to do the merge so I'll leave that up to you
%macro readin;
%do i = 1 %to &daten;
data NEW_TABLE_&&date&i..;
set TEST_&&date&i..; /*in this step you can merge on the original table however you intend to*/
run;
%end;
%mend readin;
%readin;
I have a sas data set with 564 variables. I need to create a new table with three columns, column1 will be the variables name , column2 will be the value of that variable, and column3 will be observation number.
So if I have a variable gender then gender will be listed 2 times in the variable column, and the gender values will be listed as m in the first row and female in the second row, and the column3 will be just the number of the observation. This is how it should look. Many thanks in advance.
var value obs
gender m 1
gender f 2
ans yes 3
ans no 4
The key to getting a table out of PROC FREQ that has all the values in one column is ods output combined with coalescec.
ODS OUTPUT lets you tell PROC FREQ to put everything into one dataset (as opposed to out= which just puts one Freq table into one dataset). That gives you a slightly messy result, which we then use coalescec to fix. That function takes a list of variables and returns the first nonmissing value from them; since the F_ variables always have just one value populated (the formatted value of the variable in the table), it's easy to use them.
ods output onewayfreqs=freqs;
proc freq data=sashelp.class;
tables age sex;
run;
ods output close; *technically unneeded but makes it more clear;
data want;
set freqs;
value = left(coalescec(of f_:));
run;
The rest of what you note above is trivial from that dataset.
I'd like to get a frequency table that lists all variables, but only tells me the number of times "-2", "-1" and "M" appear in each variable.
Currently, when I run the following code:
proc freq data=mydata;
tables _ALL_
/list missing;
I get one table for each variable and all of its values (sometimes 100s). Can I just get tables with the three values I want, and everything else suppressed?
You can do this a number of ways.
First off, you probably want to do this to a dataset first to allow you to filter that dataset. I would use PROC TABULATE, but you can use PROC FREQ if you like it better.
*make up some data;
data mydata;
call streaminit(132);
array x[100];
do _i = 1 to 50;
do _t = 1 to dim(x);
x[_t]= floor(rand('Uniform')*9-5);
end;
output;
end;
keep x:;
run;
ods _all_ close; *close the 'visible' output types;
ods output onewayfreqs=outdata; *output the onewayfreqs (one way frequency tables) to a dataset;
proc freq data=mydata;
tables _all_/missing;
run;
ods output close; *close the dataset;
ods preferences; *open back up your default outputs;
Then filter it, and once you've done that print it however you want. Note in the PROC FREQ output, you get a column for each different variable - not super helpful. The F_ variables are the formatted values, which can then be combined using coalesce. I assume here they're all numeric variables - define f_val as character and use coalescec if there are any character variables or variables with character-ish formats applied to them.
data has_values;
set outdata;
f_val = coalesce(of f_:);
keep table f_val frequency percent;
if f_val in (0,-1,-2);
run;
The last line keeps only the 0,-1,-2.