Compare datasets and plot in sas

Compare datasets and plot in sas - sas

I have two variables in 2 separate datasets in sas. Both have a primary key of Customer_Id and another column say LVR . One dataset has old values for the LVR Column. The other one has values from the new calculation for the same column.
I need to show the differences between both on a graph.
I tried to merge them and then tried proc gplot to plot the two LVRs.
Merged dataset looks something like this :
Cust_id LVR_new LVR_old
111 1 2
222 2 .
333 5 4
The dataset containing LVR_new is almost twice in size (number of rows) than the one containing LVR_old.We got more customers qualifying post the new calculations.
The merged dataset has 3046778 observations and 3 variables.
I tried to use proc gplot using the code below:
proc plot data=djia;
plot LVR_old*LVR_new = Cust_id;
run;
This has been running since long so i don't expect the results are going to be very useful.
Can anyone please suggest how can I achieve this. I need to showcase the differences between the two datasets on a graph to be able to show the shift in the results.
Thanks!

Why not use PROC TTEST? There are some ODS GRAPHICS plots that PROC TTEST makes.
Your problem looks exactly liked the paired comparisons example in the documentation.
http://support.sas.com/documentation/cdl/en/statug/67523/HTML/default/viewer.htm#statug_ttest_examples03.htm
proc ttest;
paired LVR_old*LVR_new;
run;

Related

Extract 2 Columns and Attach as Rows in SAS

I have two datasets in SAS. The first looks like this (let's say it is called data 1 (I'm only concerned with two columns of it)
...and the second dataset (let's say it is called data 2) looks like this:
...and I am trying to extract the second column of the first dataset and insert it into the second dataset, to achieve something that looks like this:
Basic Problem Description:
I am trying to extract two columns from a dataset in SAS and add them as rows to a second dataset. The variable names in the first dataset are in a column of their own (entitled 'variable name') and in the second dataset each variable is a column header (a variable in itself) with corresponding data. The images I provided are overly simplistic, as the actual data itself is very long.
Basically, I am trying to find functions in SAS which allow me to do this.
What I have tried
-I have tried to extract the first two columns as a table using proc sql, converted them to a data frame using a data step, sorted them, then used proc transpose to try to convert them from long to wide, then tried to use some sort of append function to tack them on to the second dataset, but append did not work.
-I have tried to merge the two sets, but the merge does not seem to work after using proc transpose.
-I have also tried transposing the second dataset and then merging them, which worked (for some reason) but then I was not able to transpose the data back (so that I can analyze it, which is my purpose in doing all of this).
What functions would I use to go about this process?
Apologies for not providing replicable data, I am more searching for recommendations for functions rather than a detailed hard solution.

To force PROC TRANSPOSE to use a variable as the source for the new variable names use the ID statement. So if you have this first dataset:
data tall;
input fruit $ count ##;
cards;
APPLE 1 PEACH 2 PEAR 2
;
You can use this code to convert it.
proc transpose data=tall out=wide;
id fruit;
var count;
run;
Then if you have another dataset that already has the variables APPLE, PEACH, PEAR etc then just set the two together.
data want;
set wide have ;
run;

How can I display a single pie chart in SAS using gchart

So I have seven different fields/variables in a SAS table each containing 1's and 0's. I need to - if at all possible - display these seven variables in one single pie. Is this possible? If so how? When I do this: pie variable1 variable2 / options I get two pies. Is there a way for me to combine them into one?

If your indicators reflect percentages of a whole WITHOUT overlap then yes you can.
This applies to multiple choice questions where you can select One of the Above. If it's a question that is select All that apply, then this would not be appropriate.
You cannot use the GCHART procedure but first need to summarize your data. Use a proc means to calculate the sums and then pass those to your PROC.
Proc means data = have stackods sum;
VAR ind1-ind7;
ODS OUTPUT summary=totals;
Run;
Use the TOTALS dataset in your gchart with the sum as the Pie statement. I don't recall what the variable is called.

Boxplot in SAS using proc gchart

First question, is it possible to produce a boxplot using proc gchart in SAS?
If it is possible, please give me a brief idea.
Or else, on the topic of using proc boxplot.
Suppose I have a dataset that has three variables ID score year;
something like,
data aaa;
input id score year;
datalines;
1 50 2008
1 40 2007
2 30 2008
2 20 2007
;
run;
I want to produce a boxplot showing for each ID in each year. (So in this case, 4 boxplots in a single plot)
How can i achieve this?
I have tried using
proc boxplot data=aaa;
plot score*ID;
by year;
run;
However, this is not working as we can see year is not sorted by order.
Is there a way to get other this?

You need to sort your input dataset first. Run this
proc sort data = aaa;
by year;
run;
and then your proc boxplot should work as written.

This is quite easy to do with sgplot, which is part of the newer ODS Graphics suite which is available in base SAS.
proc sgplot data=sashelp.cars;
vbox mpg_city/category=type group=origin grouporder=ascending;
run;
You would use category=id and group=year in your example data - you get one separate tick on the x axis for each category and then you get a separate bar clustered together for each group.

How to merge multiple datasets by common variable in SAS?

I have multiple datasets. Each of them has different number of attributes. I want to merge them all by common variable. This is 'union' if I use proc SQL. But there is hunderds of variables.
Example.
Dataset_Name Number of columns
dataset1 110
dataset2 120
dataset3 130
... ...
Say they have 100 columns in common. The final dataset which contains all dataset1,dataset2,dataset3..etc
only has common columns(in this case, 100 columns).
How do I do this?
And how do I get columns for each dataset this is not in common with the final dataset.
example: dataset1 will have 10 columns that are not in the final dataset, and list the name of 10 columns.
Thanks!!!!

UNION in SQL is equivalent to sequential SET in SAS.
data want;
set dataset1 dataset2 dataset3;
run;
Now, SAS by default includes all columns present in any dataset. To limit to just what's in all datasets, you have to use a keep statement.
You can determine this using proc sql, among other ways.
proc sql;
select name into :commonlist separated by ' '
from dictionary.columns C, dictionary.columns D
where C.libname=D.libname
and C.memname='DATASET1'
and D.memname='DATASET2'
and C.name=D.name
;
quit;
For more than two datasets it's more complicated and partially depends on your, but if you're comfortable in SQL you can figure that out pretty easily. A similar construct can create a list of just dataset 1 variables. The important part is the into :commonlist separated by ' ', which says to pull the select results into a macro variable called commonlist, separating rows by space. (The colon says to create a macro variable, not a table.)
So you can then run:
data want (keep=&commonlist.) dset1(keep=&dset1list.) dset2(keep=&dset2list.);
set dataset1(in=ds1) dataset2(in=ds2) dataset3(in=ds3);
output want;
if ds1 then output dset1;
else if ds2 then output dset2;
else if ds3 then output dset3;
run;
The in=xyz indicates which dataset a row came from. Each output dataset can have a separate list of variables to keep. You might want to keep the ID variable in those other datasets as well.
I will say that usually in SAS you don't do what you're doing here: it's not easy to do because it doesn't tend to be the best way to handle things - specifically, the little split off datasets. In general you would just keep those extra variables on the master dataset, and they'd just be nulls for anyone not in a dataset with that variable - assuming it makes sense to make this 'master' dataset at all.

New SAS variable conditional on observations

(first time posting)
I have a data set where I need to create a new variable (in SAS), based on meeting a condition related to another variable. So, the data contains three variables from a survey: Site, IDnumb (person), and Date. There can be multiple responses from different people but at the same site (see person 1 and 3 from site A).
Site IDnumb Date
a 1 6/12
b 2 3/4
c 4 5/1
a 3 .
d 5 .
I want to create a new variable called Complete, but it can't contain duplicates. So, when I go to proc freq, I want site A to be counted once, using the 6/12 Date of the Completed Survey. So basically, if a site is represented twice and contains a Date in one, I want to only count that one and ignore the duplicate site without a date.
N %
Complete 3 75%
Last Month 1 25%
My question may be around the NODUP and NODUPKEY possibilities. If I do a Proc Sort (nodupkey) by Site and Date, would that eliminate obs "a 3 ."?
Any help would be greatly appreciated. Sorry for the jumbled "table", as this is my first post (hints on making that better are also welcomed).

You can do this a number of ways.
First off, you need a complete/not complete binary variable. If you're in the datastep anyway, might as well just do it all there.
proc sort data=yourdata;
by site date descending;
run;
data yourdata_want;
set yourdata;
by site date descending;
if first.site then do;
comp = ifn(date>0,1,0);
output;
end;
run;
proc freq data=yourdata_want;
tables comp;
run;
If you used NODUPKEY, you'd first sort it by SITE DATE DESCENDING, then by SITE with NODUPKEY. That way the latest date is up top. You also could format COMP to have the text labels you list rather than just 1/0.
You can also do it with a format on DATE, so you can skip the data step (still need the sort/sort nodupkey). Format all nonmissing values of DATE to "Complete" and missing value of date to "Last Month", then include the missing option in your proc freq.
Finally, you could do the table in SQL (though getting two rows like that is a bit harder, you have to UNION two queries together).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js