Piecewise linear regression with SAS PHREG

Piecewise linear regression with SAS PHREG - sas

How to implement a piecewise linear regression model in PHREG procedure of SAS?
For example with one knot at X=T:
Y = β_10 + β_11 . X if X ≤ T
Y = β_20 + β_21 . X if X >T
Given the model with the constraint of continuity:
Y = β_10 + β_11 . X if X ≤ T
Y = β_10 + (β_11 - β_21) T + β_21 . X if X >T
i.e :
Y= β_0 + β_1 . X + S_1
where
S_1 = ( β_11 - β_21 ) T if X >T and 0 otherwise.
Finally i would like to include it in a Cox model:
Proc PHREG
Model time * cas (censure) = X S_1 ;
Run ;
But the problem is S_1 has unknown beta coefficients in it.
Thanks for your help!

Related

SAS color series separately in SGPLOT

Let's say I have four X-Y plots that I want to plot on the same figure. How can I color each line plot separately?
I've tried the below code:
proc sgplot data = band;
*styleattrs datacolors=(lightblue red green blue black purple brown yellow);
series x = X y = a /markerattrs=(color=green);
series x = X y = b /markerattrs=(color=blue);
series x = X y =c /markerattrs=(color=lightblue);
series x = X y =d /markerattrs=(color=red);
run;
Cheers.

you almost did it
proc sgplot data = band;
series x = X y = a /lineattrs=(color=green);
series x = X y = b /lineattrs=(color=blue);
series x = X y =c /lineattrs=(color=lightblue);
series x = X y =d /lineattrs=(color=red);
run;
By the way, datacolors= only take effect on different group, not different plot.

Multiplication of two variables for linear problems in glpk (gusek)

I'm trying to implemenet an assignment problem. I have the following problem when trying to multiply two variables in linear programming (using glpk gusek) in my goal function:
minimize PATH_COST: sum{k in Rodzaj_Transportu}(sum{z in numery_Zlecen}Koszty_Suma[k,z])*y[k,z]; #y is a binary variable; Koszty_Suma is total cost for ordez z and car type k
The following error is arising: "model.mod:47: multiplication of linear forms not allowed".
Code (.dat file):
data;
set numery_Zlecen := 1, 2, 3; #order numbers
set Miasta := '*some data: *' #cities.
#order numer (from city to city)
set Zlecenie[1] := Warszawa Paris;
set Zlecenie[2] := Berlin Praha;
set Zlecenie[3] := Praha Amsterdam;
#number of packages for transport for a particular order
param Ilosc_Wyrobow :=
1 10
2 50
3 110;
param Godziny_Pracy := 9; #number of working hours during the day
param Pojemnosc_Samochodu := 35; #capacity of the car (how many packages it can take)
param Srednia_Predkosc := 80; #average car speed
param Spalenie_Paliwa := 0.25; #fuel combustion
param Wynagrodzenie_za_Godzine := 20; #salary for one working hour
param Cena_Noclegu := 100; #price of accommodation
param Dystans: '*some data: *' #km between cities.
param Koszt_Paliwa : '*some data: *' #fuel consumption depends on country.
end;
Code (.mod file):
#INDEXY
#=====================================================================
set Miasta; #i,j
set numery_Zlecen; #z
set Zlecenie{numery_Zlecen} dimen 2; #p,q
set Rodzaj_Transportu; #k
#PARAMETRY
#=====================================================================
param Dystans {Miasta,Miasta};
param Ilosc_Wyrobow{numery_Zlecen};
param Godziny_Pracy >= 0;
param Pojemnosc_Samochodu {Rodzaj_Transportu}>= 0;
param Srednia_Predkosc >=0;
param Spalenie_Paliwa >=0;
param Koszt_Paliwa {Miasta,Miasta};
param Wynagrodzenie_za_Godzine >= 0;
param Cena_Noclegu >= 0;
#ZMIENE
#=====================================================================
var x{Miasta,Miasta,numery_Zlecen} <= 1, >= 0; #variable x equal 1 when we're going the path from city A to city B; otherwise it equals 0
var y{Rodzaj_Transportu,numery_Zlecen} binary <=1, >=0; #variable that shows what types of car/s we are using for order (can be 0 or 1)
var Koszty_Suma{Rodzaj_Transportu,numery_Zlecen}; #total costs
var Koszty_Transportu{numery_Zlecen}; #transport costs
var Koszty_Odpoczynku{numery_Zlecen}; #rest costs
var Koszty_Wynagrodzenia{numery_Zlecen}; #salary costs
#FUNKCjA CELU
#=====================================================================
minimize PATH_COST: sum{k in Rodzaj_Transportu}(sum{z in numery_Zlecen}Koszty_Suma[k,z])*y[k,z];
#OGRANICZENIA (constraints)
#=====================================================================
s.t. SOURCE{z in numery_Zlecen, (p,q) in Zlecenie[z], i in Miasta: i = p && p != q}:
sum {j in Miasta} (x[i ,j ,z ]) - sum {j in Miasta}( x[j ,i ,z ]) = 1;
s.t. INTERNAL {z in numery_Zlecen, (p,q) in Zlecenie[z],i in Miasta: i != p && i != q && p != q }:
sum {j in Miasta} (x[i ,j ,z ]) - sum {j in Miasta}( x[j ,i ,z ]) = 0;
s.t. OGR_KM_DZIEN{z in numery_Zlecen,(p,q) in Zlecenie[z], j in Miasta, i in Miasta: i != q}:
if (Dystans[i,j] > (Godziny_Pracy*Srednia_Predkosc)) and i != q then x[i,j,z] = 0;
s.t. OGR_KOSZTY_SUMA{z in numery_Zlecen, k in Rodzaj_Transportu}:
Koszty_Suma[k,z] = (Koszty_Transportu[z] + Koszty_Odpoczynku[z] + Koszty_Wynagrodzenia[z])*ceil(Ilosc_Wyrobow[z]/Pojemnosc_Samochodu[k]);
s.t. OGR_KOSZTY_TRANSPORTU{z in numery_Zlecen}:
Koszty_Transportu[z] = (sum{i in Miasta} (sum{j in Miasta} ( Dystans[i,j]*x[i,j, z]*Koszt_Paliwa[i,j] ) ))*Spalenie_Paliwa;
s.t. OGR_KOSZTY_ODPOCZYNKU{z in numery_Zlecen}:
Koszty_Odpoczynku[z] =
(sum{i in Miasta} (sum{j in Miasta} ( Dystans[i,j]*x[i,j, z] ) ))/(Godziny_Pracy*Srednia_Predkosc) * Cena_Noclegu;
s.t. OGR_KOSZTY_WYNAGRODZENIA{z in numery_Zlecen}:
Koszty_Wynagrodzenia[z] =
((sum{i in Miasta} (sum{j in Miasta} ( Dystans[i,j]*x[i,j, z] ) ))/(Srednia_Predkosc)) * Wynagrodzenie_za_Godzine;
s.t. OGR_Y_JEDEN{z in numery_Zlecen}:
sum{k in Rodzaj_Transportu}(y[k,z]) = 1;
solve;
How is it possible to get rid of this error? Any hints how to solve this kind of problem are welcome.

First I think the parentheses are incorrect (note that y[k,z] depends on z). The expression
sum{k in Rodzaj_Transportu}(sum{z in numery_Zlecen}Koszty_Suma[k,z])*y[k,z];
is not mathematically correct. So, I assume what you meant is:
sum{k in Rodzaj_Transportu}(sum{z in numery_Zlecen}Koszty_Suma[k,z]*y[k,z]);
Let me restate the problem a little bit. I assume we can write this as:
sum((i,j), x[i,j]*y[i,j])
with y a binary variable and x a continuous variable. I also assume 0 <= x[i,j] <= U[i,j]. (U is an upper bound).
Here is a way to linearize this quadratic term. We can introduce a variable z[i,j]=x[i,j]*y[i,j] using the following inequalities:
z[i,j] <= U[i,j]*y[i,j]
z[i,j] <= x[i,j]
z[i,j] >= x[i,j]-U[i,j]*(1-y[i,j])
0 <= z[i,j] <= U[i,j]
Now you just can minimize sum((i,j),z[i,j]). For a similar linearization see link.

How to calculate performance curve for each row of data

I want to plot a performance curve for each row of data I have.
A simple version of what I want to do is plot the function with the equation as Y= m*X+b, where I have a table with m and b values and I want Y values for X = 1 to 10.
How is this calculated?
A Y = mX + b example can be seen in the following plot:

The following works:
WITH NUMBERS AS
(
SELECT N FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10))N(N)
),
Examples AS
(
SELECT m,b FROM (VALUES (1,2),(2,2))N(m,b)
)
SELECT
'Y = ' + CAST(Examples.m as varchar(10)) + 'X + ' + CAST(Examples.b as varchar(10)) AS Formula
,Numbers.N AS X
, Numbers.N * Examples.m + Examples.b
FROM Examples
CROSS JOIN NUMBERS

pyspark mathematical computation in a dataframe

I have extracted a Dataframe from a larger Dataframe, and now I need to do simple computation like addition and division in dataframe.
sample dataframe is like.
item counts
z 23156
x 15462
What I need to do is to divide x by sum of x and z
for example
value= x/x+z

You must compute the sum of x and first then divide x by sum(x) + sum(y)
for example:
Table 1(original table):
x z
1 2
3 4
Table 2 (Aggregated table):
table2 = sqlCtx.sql("select sum(x) + sum(z) as sum_xz")
table2.registerTempTable("table2")
sum_xz
10
Then join both table and divide
table3 = sqlCtx.sql("select a.x / bs.um_xz from table1 a join table2 b")
For your reference.

Error:'no variables defined' in stata when using monte carlo simulation

I have written the program below and keep getting the error message that my variables are not defined.
Can somebody plese see where the error is and how I should adapt the code? Really nothing seems to work.
program define myreg, rclass
drop all
set obs 200
gen x= 2*uniform()
gen z = rnormal(0,1)
gen e = (invnorm(uniform()))^2
e=e-r(mean)
replace e=e-r(mean)
more
gen y = 1 + 1*x +1*z + 1*e
reg y x z
e=e-r(mean)
replace e=e-r(mean)
more
gen y = 1 + 1*x +1*z + 1*e
reg y x z
more
return scalar b0 =_[_cons]
return scalar b1=_[x]
return scalar b2 =_[z]
more
end
simulate b_0 = r(b0) b_1 = r(b1) b_2 = r(b2), rep(1000): myreg

*A possible solution with eclass
capture program drop myreg
program define myreg, eclass
* create an empty data by dropping all variables
drop _all
set obs 200
gen x= 2*uniform()
gen z = rnormal(0,1)
gen e = (invnorm(uniform()))^2
qui sum e /*to get r(mean) you need to run sum first*/
replace e=e-r(mean)
gen y = 1 + 1*x +1*z + 1*e
reg y x z
end
*gather the coefficients (_b) and standard errors (_se) from the *regression each time
simulate _b _se, reps(1000) seed (123): myreg
* show the final result
mat list r(table)
* A possible solution with rclass
* To understand the difference between rclass and eclass, see the Stata manual(http://www.stata.com/manuals13/rstoredresults.pdf)
capture program drop myreg
program define myreg, rclass
drop _all
set obs 200
gen x= 2*uniform()
gen z = rnormal(0,1)
gen e = (invnorm(uniform()))^2
qui sum e
replace e=e-r(mean)
gen y = 1 + 1*x +1*z + 1*e
reg y x z
mat output=e(b)
return scalar b0=output[1,3]
return scalar b1=output[1,1]
return scalar b2=output[1,2]
end
simulate b_0=r(b0) b_1=r(b1) b_2=r(b2), rep(1000) seed (123): myreg
return list
*P.S. You should read all the comments as suggested by #Nick to fully understand what I did here. .

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Piecewise linear regression with SAS PHREG - sas

Related

SAS color series separately in SGPLOT

Multiplication of two variables for linear problems in glpk (gusek)

How to calculate performance curve for each row of data

pyspark mathematical computation in a dataframe

Error:'no variables defined' in stata when using monte carlo simulation

Categories

Resources