I have a file in long format, like so:
name weight month cal
bob 80 01 5000
ben 70 01 4989
mary 60 01 3000
bob 81 02 4999
ben 68 02 6000
mary 57 02 2800
...
I would like to create N linear regressions of weight over cal: one for each of the months.
I know how to read the data into a dataset and how to fit a regression model.
I am not sure how I do this in a loop for the N months...
Any pointers?
Many thanks!
Related
I am using random forest modelling for a project. The tool used is SAS Eguide.
My data is 12 months history with a moving payment history. Below table summarizes my data.
acc
payment_3months
dueamt_3mons
month
2314
100
200
01
2314
300
200
02
3241
450
450
01
3241
500
500
02
3241
250
350
03
Does this data looks okay to be fed in the random forest? Or should I aggregate these data and remove duplicates and keep single row for each acc value?
Please advice
I am looking to create a dummy variable for Latin American Countries in my data set which I need to make for a log-log model. I know how to log all of them for my later regression. Any suggestion or help on how to make a dummy variable for the Latin American countries with my data would be appreciated.
data HW6;
input country : $25. midyear sancts lprots lfrac ineql pop;
cards;
CHILE 1955 58 44 65 57 6.743
CHILE 1960 19 34 65 57 7.785
CHILE 1965 27 24 65 57 8.510
CHILE 1970 36 29 65 57 9.369
CHILE 1975 38 58 65 57 10.214
COSTA_RICA 1955 16 7 54 60 1.024
COSTA_RICA 1960 6 1 54 60 1.236
COSTA_RICA 1965 2 1 54 60 1.482
COSTA_RICA 1970 3 1 54 60 1.732
COSTA_RICA 1975 2 3 54 60 1.965
INDIA 1955 81 134 47 52 404.478
INDIA 1960 101 190 47 52 445.857
INDIA 1965 189 845 47 52 494.882
INDIA 1970 133 915 47 52 553.619
INDIA 1975 132 127 47 52 616.551
JAMICA 1955 11 12 47 62 1.542
JAMICA 1960 9 2 47 62 1.629
JAMICA 1965 8 6 47 62 1.749
JAMICA 1970 1 1 47 62 1.877
JAMICA 1975 7 1 47 62 2.043
PHILIPPINES 1955 26 123 48 56 24.0
PHILIPPINES 1960 20 38 48 56 27.898
PHILIPPINES 1965 9 5 48 56 32.415
PHILIPPINES 1970 79 25 48 56 37.540
SRI_LANKA 1955 29 2 73 52 8.679
SRI_LANKA 1960 75 35 73 52 9.879
SRI_LANKA 1965 25 63 73 52 11.202
SRI_LANKA 1970 34 14 73 52 12.532
TURKEY 1955 79 1 67 61 24.145
TURKEY 1960 138 19 67 61 28.217
TURKEY 1965 36 51 67 61 31.951
TURKEY 1970 51 22 67 61 35.743
URUGUAY 1955 8 4 57 48 2.372
URUGUAY 1960 12 1 57 48 2.538
URUGUAY 1965 16 14 57 48 2.693
URUGUAY 1970 21 19 57 48 2.808
URUGUAY 1975 24 45 57 48 2.829
VENEZUELA 1955 38 14 76 65 6.110
VENEZUELA 1960 209 23 76 65 7.632
VENEZUELA 1965 100 162 76 65 9.119
VENEZUELA 1970 9 27 76 65 10.709
VENEZUELA 1975 4 12 76 65 12.722
;
data newData;
set HW6;
sancts = log (sancts);
lprots = log (lprots);
lfrac = log (lfrac);
ineql = log (ineql);
pop = log (pop);
run;
The GLMSELECT procedure is one simple way of creating dummy variables.
There is a nice article about how to use it to generate dummy variables
data newData;
set HW6;
sancts = log (sancts);
lprots = log (lprots);
lfrac = log (lfrac);
ineql = log (ineql);
pop = log (pop);
Y = 0; *-- Create a fake response variable --*
run;
proc glmselect data=newData noprint outdesign(addinputvars)=want(drop=Y);
class country;
model Y = country / noint selection=none;
run;
If needed in further step, use the macro-variable &_GLSMOD created by the procedure that contains the names of the dummy variables.
The real question here is not related to SAS, it is related on how to get the region of a country by its name.
I would give a try to the ISO 3166 which lists all countries and their geographical location.
Getting that list is straight forward, then import that list in SAS, use a merge by country and finally identify the countries in Latin America
I have different shipment data single sheet like below screen shot for reference
Trucksp Truckquty Filter Seasp Seaquty Filter STOsp STOquty Filter
45 66 TRUCK 55 67 SEA 34 45 STO
55 76 TRUCK 55 97 SEA 44 55 STO
45 66 TRUCK 25 27 SEA 22 88 STO
if i select truck filter i want show only trucksp and truckquty divide value . value should be show based on the filter selection . any idea .please help me
Measure= divide(max(tbl[trucksp]), max(tbl[truckquty))
I need help restructuring the data. My Table looks like this
NameHead Department Per_test Per_Delta Per_DB Per_Vul
Nancy Health 55 33.2 33 63
Jim Air 25 22.8 23 11
Shu Water 26 88.3 44 12
Dick Electricity 77 55.9 66 10
Elena General 88 22 67 9
Nancy Internet 66 12 44 79
And I want my table to look like this
NameHead Nancy Jim Shu Dick Elena Nancy
Department Health Air Water Electricity General Internet
Per_test 55 25 26 77 88 66
Per_Delta 33.2 22.8 88.3 55.9 22 12
PerDB 33 23 44 66 67 44
Per_Vul 63 11 12 10 9 79
I tried proc transpose but couldnt get the desired result. Please help!
Thanks!
PROC TRANSPOSE does exactly what you want. You must include a VAR statement if you want to include the character variables.
proc transpose data=have out=want;
var _all_;
run;
Note that you cannot have variables that do not have names. Here is what the dataset looks like.
Obs _NAME_ COL1 COL2 COL3 COL4 COL5 COL6
1 NameHead Nancy Jim Shu Dick Elena Nancy
2 Department Health Air Water Electricity General Internet
3 Percent_test 55 25 26 77 88 66
4 Percent_Delta 33.2 22.8 88.3 55.9 22 12
5 Percent_DB 33 23 44 66 67 44
6 Percent_Vul 63 11 12 10 9 79
i have a spreadsheed in calc. with some records. There is a column that contains the following information
Ecole Saint-Exupery
Rue Saint-Malo 24
67544 Paris
Well i need to have those lines divided into at least three columns
name: Ecole Saint-Exupery
street: Rue Saint-Malo 24
postal code and town 67544 Paris
Or even better - i have divided the postal code and town into two seperate columns!?
Question: is this possible? Can (or should) i do this in calc (open document-formate)?
Do i need to have to use a regex and perl or am i able to solve this issues without an regex?
Note - finally i need to transfer the data into MySQL-database...
I look forward to a tipp...
greetings
BTW: you can see all the things in a real world-live-demo: http://192.68.214.70/km/asps/schulsuche.asp?q=a&a=50&s=1750 - see the filed
Schulname
Straße
PLZ Ort
These field contains three things - the name, the street and the Postal Code and the town!
Question: can this be divided into parts!? If you copy and paste the information - and drop it to calc then you get all the information in only one cell. How to divide and seperate all those information into three cells or even four?
BTW - i tried to translate the information to hex-code - see the follwoing...:
Staatl. Realschule Grafenau
Rachelweg 20
94481 Grafenau
00000000: 53 74 61 61 74 6C 2E 20 52 65 61 6C 73 63 68 75
00000010: 6C 65 20 47 72 61 66 65 6E 61 75 20 0A 52 61 63
00000020: 68 65 6C 77 65 67 20 32 30 0A 39 34 34 38 31 20
00000030: 20 47 72 61 66 65 6E 61 75 20 20
but i do not know if this helps here!??
Can you help me to solve the problem. Do i need to have a regex!?
Many thanks in advance for any and all help!
You may not need a regex. You should be able to take the contents of the cell in question and split it up using the newline character that is present. I am not familiar with calc, but if there is a split() or explode() function that returns an array, then splitting on a newline will yield the 3 pieces you are looking for.