SAS: Pivot table proc tabulate or proc transpose - sas

My data is currently organized as:
NAME COLOR PROFILE HEIGHT
K2000 RED C 5.10
K2001 WHITE B 7.11
K2001 BLACK B 5.12
K2001 BLUE B 5.2
K2002 BLUE A 9.3
K2002 RED A 8.2
K2006 WHITE D 5.5
K2007 WHITE A 8.6
K2007 BLUE A 5.7
K2009 WHITE D 8.8
K2010 BLACK B 5.9
K2011 RED B 9.6
K2012 RED C 7.7
K2012 BLUE C 9.6
K2012 WHITE C 7.5
K2012 BLACK C 8.9
I would like it to look like:
NAME PROFILE RED WHITE BLACK BLUE
K2000 C 5.10
K2001 B 7.11 5.12 5.2
K2002 A 8.2 9.3
K2006 D 5.5
K2007 A 8.6 5.7
K2009 D 8.8
K2010 B 5.9
K2011 B 9.6
K2012 C 7.7 7.5 8.9 9.6
Regards,

data I_HAVE;
infile datalines truncover firstobs=2;
input
#01 NAME $5.
#10 COLOR $4.
#19 PROFILE $1.
#25 HEIGHT 4.2
;
datalines;
----+----1----+----2----+----3
K2000 RED C 5.10
K2001 WHITE B 7.11
K2001 BLACK B 5.12
K2001 BLUE B 5.2
K2002 BLUE A 9.3
K2002 RED A 8.2
K2006 WHITE D 5.5
K2007 WHITE A 8.6
K2007 BLUE A 5.7
K2009 WHITE D 8.8
K2010 BLACK B 5.9
K2011 RED B 9.6
K2012 RED C 7.7
K2012 BLUE C 9.6
K2012 WHITE C 7.5
K2012 BLACK C 8.9
;
proc transpose data=I_HAVE out=I_NEED;
by NAME PROFILE;
id COLOR;
var HEIGHT;
run;

If you want to use PROC TABULATE for reporting purposes, here is an approach
proc tabulate data=have;
class name profile color;
var height;
table name*profile,
height=''*color=''*sum=''*f=best.;
run;

Related

Power BI How to calculate average/stdev of slicer selected items within Date Range?

I have a Power BI table looks like this:
Cat sales Date
apple 2.0 03/18/2021
apple 1.8 03/19/2021
apple 2.0 03/20/2021
apple 2.5 03/21/2021
peach 3.1 03/18/2021
peach 3.0 03/19/2021
peach 1.7 03/20/2021
peach 2.0 03/21/2021
pear 4.2 03/18/2021
pear 4.4 03/19/2021
pear 3.9 03/20/2021
pear 4.9 03/21/2021
I want to use a date slicer and Cat filter to calculate the (average sales)/(standard deviation of sales) during a period and display in a tile:
For example:
Slicer select: Apple and Pear
Date Slicer: 03/18/2021 - 03/20/2021
Formula:
Step 1: Average = (6.2 + 6.2 + 5.9) = 6.1
03/18/2021 2+4.2= 6.2
03/19/2021 1.8+4.4 = 6.2
03/20/2021 2+3.9 = 5.9
Step 2: Stdev(6.2,6.2,5.9) = 0.1732
Step 3: Average/Stdev = 6.1/0.1732 = 35.22
Tile display: 35.22
I am new to DAX and wasn't able to figure out the right formula, Please help!
I figured it out myself (correct me if i am wrong) and posted here if it ever be useful for other people;
Ratio =
AVERAGEX(
KEEPFILTERS(VALUES('Data_table'[Sales])),
CALCULATE(AVERAGE('Data_table'[Sales]))
)/
STDEVX.P(
KEEPFILTERS(VALUES('Data_table'[Sales])),
CALCULATE(SUM('Data_table'[Sales]))
)

fetching data from the web page using DataFrame

I am trying to scrape time series data using pandas DataFrame for Python 2.7 from the web page (http://owww.met.hu/eghajlat/eghajlati_adatsorok/bp/Navig/202_EN.htm). Could somebody please help me how I can write the code. Thanks!
I tried my code as follows:
html =urllib.urlopen("http://owww.met.hu/eghajlat/eghajlati_adatsorok/bp/Navig/202_EN.htm");
text= html.read();
df=pd.DataFrame(index=datum, columns=['m_ta','m_tax','m_taxd', 'm_tan','m_tand'])
But it doesn't give anything. Here I want to display the table as it is.
You can use BeautifulSoup for parsing all font tags, then split column a, set_index from column idx and rename_axis to None - remove index name:
import pandas as pd
import urllib
from bs4 import BeautifulSoup
html = urllib.urlopen("http://owww.met.hu/eghajlat/eghajlati_adatsorok/bp/Navig/202_EN.htm");
soup = BeautifulSoup(html)
#print soup
fontTags = soup.findAll('font')
#print fontTags
#get text from tags fonts
li = [x.text for x in soup.findAll('font')]
#remove first 13 tags, before not contain necessary data
df = pd.DataFrame(li[13:], columns=['a'])
#split data by arbitrary whitspace
df = df.a.str.split(r'\s+', expand=True)
#set column names
df.columns = columns=['idx','m_ta','m_tax','m_taxd', 'm_tan','m_tand']
#convert column idx to period
df['idx'] = pd.to_datetime(df['idx']).dt.to_period('M')
#convert columns to datetime
df['m_taxd'] = pd.to_datetime(df['m_taxd'])
df['m_tand'] = pd.to_datetime(df['m_tand'])
#set column idx to index, remove index name
df = df.set_index('idx').rename_axis(None)
print df
m_ta m_tax m_taxd m_tan m_tand
1901-01 -4.7 5.0 1901-01-23 -12.2 1901-01-10
1901-02 -2.1 3.5 1901-02-06 -7.9 1901-02-15
1901-03 5.8 13.5 1901-03-20 0.6 1901-03-01
1901-04 11.6 18.2 1901-04-10 7.4 1901-04-23
1901-05 16.8 22.5 1901-05-31 12.2 1901-05-05
1901-06 21.0 24.8 1901-06-03 14.6 1901-06-17
1901-07 22.4 27.4 1901-07-30 16.9 1901-07-04
1901-08 20.7 25.9 1901-08-01 14.7 1901-08-29
1901-09 15.9 19.9 1901-09-01 11.8 1901-09-09
1901-10 12.6 17.9 1901-10-04 8.3 1901-10-31
1901-11 4.7 11.1 1901-11-14 -0.2 1901-11-26
1901-12 4.2 8.4 1901-12-22 -1.4 1901-12-07
1902-01 3.4 7.5 1902-01-25 -2.2 1902-01-15
1902-02 2.8 6.6 1902-02-09 -2.8 1902-02-06
1902-03 5.3 13.3 1902-03-22 -3.5 1902-03-13
1902-04 10.5 15.8 1902-04-21 6.1 1902-04-08
1902-05 12.5 20.6 1902-05-31 8.5 1902-05-10
1902-06 18.5 23.8 1902-06-30 14.4 1902-06-19
1902-07 20.2 25.2 1902-07-01 15.5 1902-07-03
1902-08 21.1 25.4 1902-08-07 14.7 1902-08-13
1902-09 16.1 23.8 1902-09-05 9.5 1902-09-24
1902-10 10.8 15.4 1902-10-12 4.9 1902-10-25
1902-11 2.4 9.1 1902-11-01 -4.2 1902-11-18
1902-12 -3.1 7.2 1902-12-27 -17.6 1902-12-15
1903-01 -0.5 8.3 1903-01-11 -11.5 1903-01-23
1903-02 4.6 13.4 1903-02-23 -2.7 1903-02-17
1903-03 9.0 16.1 1903-03-28 4.9 1903-03-09
1903-04 9.0 16.5 1903-04-29 2.6 1903-04-19
1903-05 16.4 21.2 1903-05-03 11.3 1903-05-19
1903-06 19.0 23.1 1903-06-03 15.6 1903-06-07
... ... ... ... ... ...
1998-07 22.5 30.7 1998-07-23 15.0 1998-07-09
1998-08 22.3 30.5 1998-08-03 14.8 1998-08-29
1998-09 16.0 21.0 1998-09-12 10.4 1998-09-14
1998-10 11.9 17.2 1998-10-07 8.2 1998-10-27
1998-11 3.8 8.4 1998-11-05 -1.6 1998-11-21
1998-12 -1.6 6.2 1998-12-14 -8.2 1998-12-26
1999-01 0.6 4.7 1999-01-15 -4.8 1999-01-31
1999-02 1.5 6.9 1999-02-05 -4.8 1999-02-01
1999-03 8.2 15.5 1999-03-31 3.0 1999-03-16
1999-04 13.1 17.1 1999-04-16 6.1 1999-04-18
1999-05 17.2 25.2 1999-05-31 11.1 1999-05-06
1999-06 19.8 24.4 1999-06-07 12.2 1999-06-22
1999-07 22.3 28.0 1999-07-06 16.3 1999-07-23
1999-08 20.6 26.7 1999-08-09 17.3 1999-08-23
1999-09 19.3 22.9 1999-09-26 15.0 1999-09-02
1999-10 11.5 19.0 1999-10-03 5.7 1999-10-18
1999-11 3.9 12.6 1999-11-04 -2.2 1999-11-21
1999-12 1.3 6.4 1999-12-13 -8.1 1999-12-25
2000-01 -0.7 8.7 2000-01-31 -6.6 2000-01-25
2000-02 4.5 10.2 2000-02-01 -0.1 2000-02-23
2000-03 6.7 11.6 2000-03-09 0.6 2000-03-17
2000-04 14.8 22.1 2000-04-21 5.8 2000-04-09
2000-05 18.7 23.9 2000-05-27 12.3 2000-05-22
2000-06 21.9 29.3 2000-06-14 15.4 2000-06-17
2000-07 20.3 26.6 2000-07-03 14.0 2000-07-16
2000-08 23.8 29.7 2000-08-20 18.5 2000-08-31
2000-09 16.1 21.5 2000-09-14 12.7 2000-09-24
2000-10 14.1 18.7 2000-10-04 8.0 2000-10-23
2000-11 9.0 14.9 2000-11-15 3.7 2000-11-30
2000-12 3.0 9.4 2000-12-14 -6.8 2000-12-24
[1200 rows x 5 columns]

Append one column below another

I want to append one column below another column. My dataset looks like the following:
date xy ab cd
1 1.5 3.1 4.8
2 4.3 8.5 1.0
3 7.7 9.1 7.7
I want to create a dataset which looks like this:
date id price
1 xy 1.5
2 xy 4.3
3 xy 7.7
1 ab 3.1
2 ab 8.5
3 ab 9.1
1 cd 4.8
2 cd 1.0
3 cd 7.7
Do you have an idea how I can handle this?
Like this:
proc transpose data=indataname out=outdataname(rename=(_NAME_=id col1 = price));
by date;
run;

why ranks were different?

One:
data have;
input x1 x2;
diff=x1-x2;
a_diff= round(abs(diff), .01);
* a_diff=abs(diff);
cards;
50.7 60
3.3 3.3
28.8 30
46.2 43.2
1.2 2.2
25.5 27.5
2.9 4.9
5.4 5
3.8 3.2
1 4
;
run;
proc rank data =have out =have_r;
where diff;
var a_diff ;
ranks a_diff_r;
run;
proc print data =have_r;run;
Results:
Obs x1 x2 diff a_diff a_diff_r
1 50.7 60.0 -9.3 9.3 9.0
2 28.8 30.0 -1.2 1.2 4.0
3 46.2 43.2 3.0 3.0 7.5
4 1.2 2.2 -1.0 1.0 3.0
5 25.5 27.5 -2.0 2.0 5.5
6 2.9 4.9 -2.0 2.0 5.5
7 5.4 5.0 0.4 0.4 1.0
8 3.8 3.2 0.6 0.6 2.0
9 1.0 4.0 -3.0 3.0 7.5
Two:
data have;
input x1 x2;
diff=x1-x2;
a_diff=abs(diff);
cards;
50.7 60
3.3 3.3
28.8 30
46.2 43.2
1.2 2.2
25.5 27.5
2.9 4.9
5.4 5
3.8 3.2
1 4
;
run;
proc rank data =have out =have_r;
where diff;
var a_diff ;
ranks a_diff_r;
run;
proc print data =have_r;run;
results:
Obs x1 x2 diff a_diff a_diff_r
1 50.7 60.0 -9.3 9.3 9.0
2 28.8 30.0 -1.2 1.2 4.0
3 46.2 43.2 3.0 3.0 7.5
4 1.2 2.2 -1.0 1.0 3.0
5 25.5 27.5 -2.0 2.0 5.0
6 2.9 4.9 -2.0 2.0 6.0
7 5.4 5.0 0.4 0.4 1.0
8 3.8 3.2 0.6 0.6 2.0
9 1.0 4.0 -3.0 3.0 7.5
Attention Please,Obs 3,9,5,6, why ranks were different? Thank you!
Run the code below and you'll see that they are actually different. That's because of inaccuracies in numeric storage; similar to how 1/3 is not representable in decimal notation (0.333333333333333 etc.) and 1-(1/3)-(1/3)-(1/3) is not equal to zero if you use, say, ten digits to store each result as you go (it is equal to 0.000000001, then), any computer system will have some issues with certain numbers that while in decimal (base 10) appear to store nicely, in binary do not.
The solution here is basically to round as you are, or to fuzz the result which amounts to the same thing (it ignores differences less than 1x10^-12).
data have;
input x1 x2;
diff=x1-x2;
a_diff=abs(diff);
put a_diff= hex16.;
cards;
50.7 60
3.3 3.3
28.8 30
46.2 43.2
1.2 2.2
25.5 27.5
2.9 4.9
5.4 5
3.8 3.2
1 4
;
run;

Read by Row in SAS

How do I read a data file one row at a time in SAS?
Say, I have 3 lines of data
1.0 3.0 5.6 7.8
2.3 4.9
3.2 5.3 6.8 7.5 3.9 4.1
I have to read each line in a different variable. I want the data to look like.
A 1.0
A 3.0
A 5.6
A 7.8
B 2.3
B 4.9
C 3.2
C 5.3
C 6.8
C 7.5
C 3.9
C 4.1
I tried a bunch of things.
If it has a variable name before every data point, following code works fine
INPUT group $ x ##;
I can't figure out how to go about this. Can someone please guide me on this?
Thanks
i think this will produce almost exactly the result you want. you could apply a format to the Group variable.
data orig;
infile datalines missover pad;
format Group 4. Value 4.1;
Group = _n_;
do until (Value eq .);
input value #;
if value ne . then output;
else return;
end;
datalines;
1.0 3.0 5.6 7.8
2.3 4.9
3.2 5.3 6.8 7.5 3.9 4.1
run;
proc print; run;
/*
Obs Group Value
1 1 1.0
2 1 3.0
3 1 5.6
4 1 7.8
5 2 2.3
6 2 4.9
7 3 3.2
8 3 5.3
9 3 6.8
10 3 7.5
11 3 3.9
12 3 4.1 */