I have the following dataset and hoping to transpose it into long form:
data have ;
input Height $ Front Middle Rear ;
cards;
Low 125 185 126
Low 143 170 136
Low 150 170 129
Low 138 195 136
Low 149 162 147
Medium 141 176 128
Medium 137 161 133
Medium 145 167 148
Medium 150 165 145
Medium 130 184 141
High 129 157 149
High 141 152 137
High 148 186 138
High 130 164 126
High 137 176 138
;
run;
Here Height is low, medium and high. Location is front, middle and rear. Numerical values are prices by location and height of a book on a bookshelf.
I'm hoping to transpose the dataset into long form with columns:
Height, Location and Price
The following code only allows me to transpose Location into long form. How should I transpose Height at the same time?
data bookp;
set bookp;
dex = _n_;
run;
proc sort data=bookp;
by dex;
run;
proc transpose data=bookp
out=bookpLong (rename=(col1=price _name_= location )drop= _label_ dex);
var front middle rear;
by dex;
run;
I think you just need to include HEIGHT in the BY statement.
First let's convert your example data into a SAS dataset.
data have ;
input Height $ Front Middle Rear ;
cards;
Low 125 185 126
Low 143 170 136
Medium 141 176 128
Medium 137 161 133
High 129 157 149
High 141 152 137
;
Now let's add an identifier to uniquely identify each row. Note that if you are really reading the data with a data step you could do this in the same step that reads the data.
data with_id ;
row_num+1;
set have;
run;
Now we can transpose.
proc transpose data=with_id out=want (rename=(_name_=Location col1=Price));
by row_num height ;
var front middle rear ;
run;
Results:
Obs row_num Height Location Price
1 1 Low Front 125
2 1 Low Middle 185
3 1 Low Rear 126
4 2 Low Front 143
5 2 Low Middle 170
6 2 Low Rear 136
7 3 Medium Front 141
8 3 Medium Middle 176
9 3 Medium Rear 128
10 4 Medium Front 137
11 4 Medium Middle 161
12 4 Medium Rear 133
13 5 High Front 129
14 5 High Middle 157
15 5 High Rear 149
16 6 High Front 141
17 6 High Middle 152
18 6 High Rear 137
Related
Initially, I thought it can be easily implemented using a single for-loop with pre-calculated step size.
// blue = (240, 255, 255)
// red = (0, 255, 255)
const int step = (240 - 0) / (N - 1);
QColor color;
for (int i = 0; i < N; ++i)
{
color.setHsv(i * step, 255, 255);
}
But as N grows larger, the ending color may not be what I expected.
For example (N == 82), ending color is hsv(162, 255, 255) instead of hsv(240, 255, 255).
My intention is to 1) generate N distinct color, 2) spectrum-like, 3) starts in RED and ends in BLUE.
Should I take S, V into consideration for my requirement as well?
Right now, you're choosing an integer step, and adding it to get from the smallest to the largest value. This has the advantage of spacing the hues evenly. But as you've observed, it has the disadvantage of getting the range wrong (possibly quite badly wrong) at times.
If you're willing to sacrifice a little in the way of the spacing of hues being perfectly even to get closer to filling the specified range, you can compute the step as a floating point number, do floating point multiplication, and round (or truncate) afterwards.
std::vector<int> interpolate_range(int lower, int upper, int step_count) {
double step = (upper - lower) / (step_count - 1.0);
std::vector<int> ret;
for (int i=0; i<step_count; i++) {
// step is a double, so `double * i` is done in floating
// point, then the result is truncated to be pushed into
// ret.
ret.push_back(step * i);
}
return ret;
}
Skipping the other color components, and just printing out these values, we get this:
0 2 5 8 11 14 17 20 23 26
29 32 35 38 41 44 47 50 53 56
59 62 65 68 71 74 77 80 82 85
88 91 94 97 100 103 106 109 112 115
118 121 124 127 130 133 136 139 142 145
148 151 154 157 160 162 165 168 171 174
177 180 183 186 189 192 195 198 201 204
207 210 213 216 219 222 225 228 231 234
237 240
As you can see, we usually get a separation of 3, but a few times (0 to 2, 80 to 82 and 160 to 162) only 2, allowing us to fill the entire range.
I have to read a data set of 50 numbers from a text file. It's all in a row with a space delimiter and in multiple uneven lines. for example:
1 2 3 4 5 6
7 8 9 10 11 12
13 14 15
16 17 18 19 20 21
Etc.
The first 25 numbers belong to group 1, and the 2nd 25 belong to group 2. So I need to make a group variable (binary either 1 or 2), a count number (1 to 25), and a value variable which is holding the value of the number.
I am stuck on how to split the data in half when reading it. I tried to use truncover but it did not work.
Try something like this, replacing the datalines keyword with the path to your file:
data groups;
infile datalines;
format number 8. counter 2. group 1.; * Not mandatory, used here to order variables;
retain group (1);
input number ##;
counter + 1;
if counter = 26 then do;
group = 2;
counter = 1;
end;
datalines;
192 105 435 448 160 499 184 246 388 190 316
139 146 147 192 231 449 101 216 342 399 352 122 418
280 400 187 352 321 180 425 500 320 179 105
232 105 323 132 106 255 449
186 135 472 174 119 255
308 350
run;
I am currently practicing SAS programming on using two SAS dataset(sample and master) . Below are the hypothetical or dummy data created for illustration purpose to solve my problem through SAS programming . I would like to extract the data for the id's in sample dataset from master dataset(test). I have given an example with few id's as sample dataset, for which i need to extract next 12 month information from master table(test) for each id's based on the yearmonth information( desired output given in the third output).
Below is the code to extract the previous 12 month data but i am not getting idea to extract next 12 month records as pulled for previous months, Can anyone help me in solving this problem using SAS programming with optimized way.
proc sort data=test;
by id yearmonth;
run;
data result;
set test;
array prev_month {13} PREV_MONTH_0-PREV_MONTH_12;
by id;
if first.id then do;
do i =1 to 13;
prev_month(i)=0;
end;
end;
do i = 13 to 2 by -1;
prev_month(i)=prev_month(i-1);
end;
prev_month(1)=no_of_cust;
drop i prev_month_0;
retain prev_month:;
run;
data sample1;
set sample(drop=no_of_cust);
run;
proc sort data=sample1;
by id yearmonth;
run;
data all;
merge sample1(in=a) result(in=b);
by id yearmonth;
if a;
run;
One sample dataset (dataset name - sample).
ID YEARMONTH NO_OF_CUST
1 200909 50
1 201005 65
1 201008 78
1 201106 95
2 200901 65
2 200902 45
2 200903 69
2 201005 14
2 201006 26
2 201007 98
One master dataset - dataset name (test) (huge dataset over the year for each id from start of the account to till date.)
ID YEARMONTH NO_OF_CUST
1 200808 125
1 200809 125
1 200810 111
1 200811 174
1 200812 98
1 200901 45
1 200902 74
1 200903 73
1 200904 101
1 200905 164
1 200906 104
1 200907 22
1 200908 35
1 200909 50
1 200910 77
1 200911 86
1 200912 95
1 201001 95
1 201002 87
1 201003 79
1 201004 71
1 201005 65
1 201006 66
1 201007 66
1 201008 78
1 201009 88
1 201010 54
1 201011 45
1 201012 100
1 201101 136
1 201102 111
1 201103 17
1 201104 77
1 201105 111
1 201106 95
1 201107 79
1 201108 777
1 201109 758
1 201110 32
1 201111 15
1 201112 22
2 200711 150
2 200712 150
2 200801 44
2 200802 385
2 200803 65
2 200804 66
2 200805 200
2 200806 333
2 200807 285
2 200808 265
2 200809 222
2 200810 220
2 200811 205
2 200812 185
2 200901 65
2 200902 45
2 200903 69
2 200904 546
2 200905 21
2 200906 256
2 200907 214
2 200908 14
2 200909 44
2 200910 65
2 200911 88
2 200912 79
2 201001 65
2 201002 45
2 201003 69
2 201004 54
2 201005 14
2 201006 26
2 201007 98
Desired Output should like below,
ID YEARMONTH NO_OF_CUST AFTER_MONTH_1 AFTER_MONTH_2 AFTER_MONTH_3 AFTER_MONTH_4 AFTER_MONTH_5 AFTER_MONTH_6 AFTER_MONTH_7 AFTER_MONTH_8 AFTER_MONTH_9 AFTER_MONTH_10 AFTER_MONTH_11 AFTER_MONTH_12
1 200909 50 77 86 95 95 87 79 71 65 66 66 78 88
Step1: Join your sample table with the main(test) table and using intnx to get all the values for next 12 months.
Step2: Making a column names "after month"
Step3: Transpose to get your final output
proc sql;
create table abc as
select a.id,a.yearmonth,b.yearmonth as yearmonth1, b.no_of_cust
from
sample a
left join
test b
on a.id = b.id and a.yearmonth <= b.yearmonth <= intnx("month",a.yearmonth,12)
order by a.id,a.yearmonth,b.yearmonth;
quit;
data abc1(drop=col yearmonth1);
set abc;
by id yearmonth;
if first.yearmonth then col=-1;
col+1;
columns = compress("after_month_"||col);
run;
proc transpose data=abc1 out=abc2(rename=(after_month_0 = no_of_cust) drop=_name_);
by id yearmonth;
id columns;
var no_of_cust;
run;
My Output:
Or
If you want to make changes in your query then you could use the below code.
proc sort data=test;
by id descending yearmonth;
run;
data result;
set test;
array after_month {13} after_MONTH_0-after_MONTH_12;
by id;
if first.id then do;
do i = 1 to 13;
after_month(i) = 0;
end;
end;
do i = 13 to 2 by -1;
after_month(i) = after_month(i-1);
end;
after_month(1) = NO_OF_CUST;
drop i after_MONTH_0;
retain after_MONTH:;
run;
data sample1;
set sample(drop=no_of_cust);
run;
proc sort data=result;
by id yearmonth;
run;
proc sort data=sample1;
by id yearmonth;
run;
data all;
merge sample1(in=a) result(in=b);
by id yearmonth;
if a;
run;
Let me know in case of any queries.
a = 100
for b in range(10,a):
c = b%10
if c == 0:
c += 3
c = c*b
print c
I was trying to make a random generator without using random function and I made this, does it generate random numbers?
Short Answer:
No.
Your code will print
30 11 24 39 56 75 96 119 144 171 60 21 44 69 96 125 156 189 224 261 90 31 64 99 136 175 216 259 304 351 120 41 84 129 176 225 276 329 384 441 150 51 104 159 216 275 336 399 464 531 180 61 124 189 256 325 396 469 544 621 210 71 144 219 296 375 456 539 624 711 240 81 164 249 336 425 516 609 704 801 270 91 184 279 376 475 576 679 784 891
every time.
Computers and programs like these are deterministic. If you sat down with a pen and paper you could tell me exactly which of these number would occur, when they would occur.
Random number generation is difficult, what I would recommend is using time to (seem to) randomize the output.
import time
print int(time.time() % 10)
This will give you a "random" number between 0 and 9.
time.time() gives you the number of milliseconds since (I believe) epoch time. It's a floating point number so we have to cast to an int if we want a "whole" integer number.
Caveat: This solution is not truly random, but will act in a much more "random" fashion.
I have a table with 20 columns of measurements. I would like 'convert' the table into a table with 20 rows with columns of Avg, Min, Max, StdDev, Count types of information. There is another question like this but it was for the 'R' language. Other question here.
I could do the following for each column (processing the results with C++):
Select Count(Case When [avgZ_l1] <= 0.15 and avgZ_l1 > 0 then 1 end) as countValue1,
Count(case when [avgZ_l1] <= 0.16 and avgZ_l1 > 0.15 then 1 end) as countValue2,
Count(case when [avgZ_l1] <= 0.18 and avgZ_l1 > 0.16 then 1 end) as countValue3,
Count(case when [avgZ_l1] <= 0.28 and avgZ_l1 > 0.18 then 1 end) as countValue4,
Avg(avgwall_l1) as avg1, Min(avgwall_l1) as min1, Max(avgZ_l1) as max1,
STDEV(avgZ_l1) as stddev1, count(*) as totalCount from myProject.dbo.table1
But I do not want to process the 50,000 records 20 times (once for each column). I thought there would be away to 'pivot' the table onto its side and process the data at the same time. I have seen examples of the 'Pivot' but they all seem to pivot on a integer type field, Month number or Device Id. Once the table is converted I could then fetch each row with C++. Maybe this is really just 'Insert into ... select ... from' statements.
Would the fastest (execution time) approach be to simply create a really long select statement that returns all the information I want for all the columns?
We might end up with 500,000 rows. I am using C++ and SQL 2014.
Any thoughts or comments are welcome. I just don't want have my naive code to be used as a shining example of how NOT to do something... ;)...
If your table looks the same as the code that you sent in r then the following query should work for you. It selects the data that you requested and pivots it at the same time.
create table #temp(ID int identity(1,1),columnName nvarchar(50));
insert into #temp
SELECT COLUMN_NAME as columnName
FROM myProject.INFORMATION_SCHEMA.COLUMNS -- change myProject to the name of your database. Unless myProject is your database
WHERE TABLE_NAME = N'table1'; --change table1 to your table that your looking at. Unless table1 is your table
declare #TableName nvarchar(50) = 'table1'; --change table1 to your table again
declare #loop int = 1;
declare #query nvarchar(max) = '';
declare #columnName nvarchar(50);
declare #endQuery nvarchar(max)='';
while (#loop <= (select count(*) from #temp))
begin
set #columnName = (select columnName from #temp where ID = #loop);
set #query = 'select t.columnName, avg(['+#columnName+']) as Avg ,min(['+#columnName+']) as min ,max(['+#columnName+'])as max ,stdev(['+#columnName+']) as STDEV,count(*) as totalCount from '+#tablename+' join #temp t on t.columnName = '''+#columnName+''' group by t.columnName';
set #loop += 1;
set #endQuery += 'union all('+ #query + ')';
end;
set #endQuery = stuff(#endQuery,1,9,'')
Execute(#endQuery);
drop table #temp;
It creates a #temp table which stores the values of your column headings next to an ID. It then uses the ID when looping though the number of columns that you have. It then generates a query which selects what you want and then unions it together. This query will work on any number of columns meaning that if you add or remove more columns it should give the correct result.
With this input:
age height_seca1 height_chad1 height_DL weight_alog1
1 19 1800 1797 180 70
2 19 1682 1670 167 69
3 21 1765 1765 178 80
4 21 1829 1833 181 74
5 21 1706 1705 170 103
6 18 1607 1606 160 76
7 19 1578 1576 156 50
8 19 1577 1575 156 61
9 21 1666 1665 166 52
10 17 1710 1716 172 65
11 28 1616 1619 161 66
12 22 1648 1644 165 58
13 19 1569 1570 155 55
14 19 1779 1777 177 55
15 18 1773 1772 179 70
16 18 1816 1809 181 81
17 19 1766 1765 178 77
18 19 1745 1741 174 76
19 18 1716 1714 170 71
20 21 1785 1783 179 64
21 19 1850 1854 185 71
22 31 1875 1880 188 95
23 26 1877 1877 186 106
24 19 1836 1837 185 100
25 18 1825 1823 182 85
26 19 1755 1754 174 79
27 26 1658 1658 165 69
28 20 1816 1818 183 84
29 18 1755 1755 175 67
It will produce this output:
avg min max stdev totalcount
age 20 17 31 3.3 29
height_seca1 1737 1569 1877 91.9 29
height_chad1 1736 1570 1880 92.7 29
height_DL 173 155 188 9.7 29
weight_alog1 73 50 106 14.5 29
Hope this helps and works for you. :)