SAS for academic [closed] - sas

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 months ago.
Improve this question
I've got an issue with SAS for Academics using web-based when I using statement proc print with by.
The output didn't show all value in tobe-group variable as below
Ex: I print table by variable "Country" which have many value: Venezuela, USA,....
but the output only display group of Venezuela
Code:
proc print data=data.customers ;
by country;

Most likely the data is not sorted in ascending order of COUNTRY. That would make a lot of sense when Venezuela is listed as the first country. Check the SAS LOG for an error message like this one:
1256 proc print data=sashelp.class;
1257 by sex;
1258 run;
ERROR: Data set SASHELP.CLASS is not sorted in ascending sequence. The current BY
group has Sex = M and the next BY group has Sex = F.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 2 observations read from the data set SASHELP.CLASS.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
If the data is sorted by DESCENDING order of COUNTRY then tell PROC PRINT that fact.
proc print data=data.customers ;
by descending country;
run;
If it is grouped by country but not actually sorted then use the NOTSORTED keyword.
proc print data=data.customers ;
by country notsorted;
run;
And if it is not even grouped by country then sort it first.
proc sort data=data.customers ;
by country ;
run;
proc print data=data.customers ;
by country ;
run;

Related

using contain operator or equivilant [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
in SAS how to to use a contain (or alternative) operator when you have more than one set of letters to choose. eg where have_variable= abd, afg, afd, acc and want_variable=abd, afg, afd (containing ab or af only)
I've Split your have and want list into two tables with multiple records then left joined on Have list to find the matching ones.
The final table will look like this
/* Create your input String */
data Have;
have="abd , afg , afd , acc";
run;
data Want ;
want="abd , afg , afd";
run;
/* Splint Input strings into Multiple Rows */
data Have_List;
set Have;
do i=1 by 0;
source=lowcase(scan(have,i,','));
if missing(source) then leave;
output;
i+1;
end;
keep source ;
run;
data Want_List;
set Want;
do i=1 by 0;
lookup=lowcase(scan(want,i,','));
if missing(lookup) then leave;
match='match';
output;
i+1;
end;
keep lookup match;
run;
/* Create a SQL left join to lookup the matching values */
proc sql;
create table match as
select h.source as have , COALESCE(w.match,"no-match") as match
from have_list h left join want_list w on h.source=w.lookup;
quit;
You can use a list in your select statement.
Like that :
proc sql;
select * from my_table where have_variable in ('abd','afg','afd','acc') and want_variable in ('abd','afg','afd');
run;
quit;
You can even use the in operator in a dataset statement like this :
data want;
set mydate;
if have_variable in ('abd','afg','afd','acc') and
want_variable in ('abd','afg','afd');
run;
If you want to obtain the variable only containing 2 letters you can use the LIKE :
proc sql;
select * from my_table where have_variable like '%ab%' or have_variable like '%af%';
run;
in a dataset :
data want;
set mydate;
where have_variable like '%ab%' or
have_variable like '%af%';
run;
Regards
If you only want records that begin with ab or af (rather than contains them anywhere in the string), then you can you in followed by :. With this usage, the colon instructs SAS to only search the first n letters in the string, where n is length of the comparison (2 in your example).
Note that this only works in a datastep, not proc sql.
data have;
input have_var $;
datalines;
abd
afg
afd
acc
;
run;
data _null_;
set have;
where have_var in: ('ab','af');
put _all_;
run;

calculate industry average of variable NetMargin by date (DT)

data work.smallmarket;
set work.market;
where country=Nigeria;
NetMargin=profit2/Rev2;
keep Product# NetMargin DT;
run;
Question 1: How can i calculate an industry average NetMargin by date (DT) across all products bearing in mind that not all products will have any data? i.e. no data is not the same as 0.
Question 2: How can I calculate a moving industry average for NetMargin?
Question 1:
proc sort data= smallmarket; by date_var; run;
proc means data=smallmarket noprint;
by createdportaldate;
output out= by_date
mean(NetMargin)=
;
run;
Question 2:
If you have access, you could use Proc expand, if not, then you can find a worked example at:
http://support.sas.com/kb/25/027.html
Edit: found better example:
https://communities.sas.com/t5/Base-SAS-Programming/Calculate-moving-average-by-group/td-p/296267?nobounce

How to reshape data wide to long [duplicate]

This question already has answers here:
SAS Data formatting (reverse proc transpose?)
(2 answers)
Closed 7 years ago.
I want to reshape data columns to rows
Initial Table as shown below
ID1 ID2 ID3 Name
----------------------------
I001 I002 I003 John
Desire Table like
ID Name
------------
I001 John
I002 John
I003 John
Can anyone help out?
Thanks lots!!
One way to do this is to set up an array of IDs and loop through with an explicit OUTPUT statement.
data want;
set have;
array ids(3) id1-id3;
do i=1 to dim(ids);
ID=ids(i);
OUTPUT;
end;
run;
You can use PROC TRANSPOSE Make sure your data is sorted by NAME
proc transpose data=have out=want(rename=(_name_=ID));
by Name;
run;

Boxplots where date grouped by year

I recently asked a question about grouping in SAS. Drawing on that question, and using the same data set, I am struggling to make a box plot.
The data look like this:
Date Close Volume
12/31/2014 222.41 2402097
12/30/2014 222.23 2903242
12/29/2014 225.71 2811828
12/26/2014 227.82 3327016
12/24/2014 222.26 1333518
12/23/2014 220.97 4513321
12/22/2014 222.6 4806917
12/19/2014 219.29 6910461
12/18/2014 218.26 7483349
12/17/2014 205.82 7367834
12/16/2014 197.81 8426105
12/15/2014 204.04 5218252
12/12/2014 207 7173782
This data set actually covers two full years 2013 - 14. I would like a boxplot for each year and variable (Close and Volume).
Here is what I tried:
proc boxplot data=tsla;
class Date;
format Date year.;
plot Close*Date;
run;
But that returns an error "
ERROR 180-322: Statement is not valid or it is used out of proper order.
162 format Date year.;
163 plot Close*Date;
164 run;
"
What's the right order then?
How can I get SAS to give me 4 boxplots total? 2 variables (Close and Volume) and over two years (2013 - 14)?
There is no class statement in proc boxplot.
First, add a "year" variable using a data step.
data tsla2;
set tsla;
year=year(date);
run;
Sort by year:
proc sort data=tsla2;
by year;
run;
Use by statement in proc boxplot:
proc boxplot data=tsla2;
by year;
plot close*year;
plot volume*year;
run;
If you want all the years plotted together for each variable, there is no need to sort or use a by statement. Just do:
proc boxplot data=tsla2;
plot close*year;
plot volume*year;
run;

Contingency table in SAS

I have data on exam results for 2 years for a number of students. I have a column with the year, the students name and the mark. Some students don't appear in year 2 because they don't sit any exams in the second year. I want to show whether the performance of students persists or whether there's any pattern in their subsequent performance. I can split the data into two halves of equal size to account for the 'first-half' and 'second-half' marks. I can also split the first half into quintiles according to the exam results using 'proc rank'
I know the output I want is a 5 X 5 table that has the original 5 quintiles on one axis and the 5 subsequent quintiles plus a 'dropped out' category as well, so a 5 x 6 matrix. There will obviously be around 20% of the total number of students in each quintile in the first exam, and if there's no relationship there should be 16.67% in each of the 6 susequent categories. But I don't know how to proceed to show whether this is the case of not with this data.
How can I go about doing this in SAS, please? Could someone point me towards a good tutorial that would show how to set this up? I've been searching for terms like 'performance persistence' etc, but to no avail. . .
I've been proceeding like this to set up my dataset. I've added a column with 0 or 1 for the first or second half of the data using the first procedure below. I've also added a column with the quintile rank in terms of marks for all the students. But I think I've gone about this the wrong way. Shoudn't I be dividing the data into quintiles in each half, rather than across the whole two periods?
Proc rank groups=2;
var yearquarter;
ranks ExamRank;
run;
Proc rank groups=5;
var percentageResult;
ranks PerformanceRank;
run;
Thanks in advance.
Why are you dividing the data into quintiles?
I would leave the scores as they are, then make a scatterplot with
PROC SGPLOT data = dataset;
x = year1;
y = year2;
loess x = year1 y = year2;
run;
Here's a fairly basic example of the simple tabulation. I transpose your quintile data and then make a table. Here there is basically no relationship, except that I only allow a 5% DNF so you have more like 19% 19% 19% 19% 19% 5%.
data have;
do i = 1 to 10000;
do year = 1 to 2;
if year=2 and ranuni(7) < 0.05 then call missing(quintile);
else quintile = ceil(5*ranuni(7));
output;
end;
end;
run;
proc transpose data=have prefix=year out=have_t;
by i;
var quintile;
id year;
run;
proc tabulate data=have_t missing;
class year1 year2;
tables year1,year2*rowpctn;
run;
PROC CORRESP might be helpful for the analysis, though it doesn't look like it exactly does what you want.
proc corresp data=have_t outc=want outf=want2 missing;
tables year1,year2;
run;