Piecewise linear regression with SAS PHREG - sas

How to implement a piecewise linear regression model in PHREG procedure of SAS?
For example with one knot at X=T:
Y = β_10 + β_11 . X if X ≤ T
Y = β_20 + β_21 . X if X >T
Given the model with the constraint of continuity:
Y = β_10 + β_11 . X if X ≤ T
Y = β_10 + (β_11 - β_21) T + β_21 . X if X >T
i.e :
Y= β_0 + β_1 . X + S_1
where
S_1 = ( β_11 - β_21 ) T if X >T and 0 otherwise.
Finally i would like to include it in a Cox model:
Proc PHREG
Model time * cas (censure) = X S_1 ;
Run ;
But the problem is S_1 has unknown beta coefficients in it.
Thanks for your help!

Related

SAS color series separately in SGPLOT

Let's say I have four X-Y plots that I want to plot on the same figure. How can I color each line plot separately?
I've tried the below code:
proc sgplot data = band;
*styleattrs datacolors=(lightblue red green blue black purple brown yellow);
series x = X y = a /markerattrs=(color=green);
series x = X y = b /markerattrs=(color=blue);
series x = X y =c /markerattrs=(color=lightblue);
series x = X y =d /markerattrs=(color=red);
run;
Cheers.
you almost did it
proc sgplot data = band;
series x = X y = a /lineattrs=(color=green);
series x = X y = b /lineattrs=(color=blue);
series x = X y =c /lineattrs=(color=lightblue);
series x = X y =d /lineattrs=(color=red);
run;
By the way, datacolors= only take effect on different group, not different plot.

Multiplication of two variables for linear problems in glpk (gusek)

I'm trying to implemenet an assignment problem. I have the following problem when trying to multiply two variables in linear programming (using glpk gusek) in my goal function:
minimize PATH_COST: sum{k in Rodzaj_Transportu}(sum{z in numery_Zlecen}Koszty_Suma[k,z])*y[k,z]; #y is a binary variable; Koszty_Suma is total cost for ordez z and car type k
The following error is arising: "model.mod:47: multiplication of linear forms not allowed".
Code (.dat file):
data;
set numery_Zlecen := 1, 2, 3; #order numbers
set Miasta := '*some data: *' #cities.
#order numer (from city to city)
set Zlecenie[1] := Warszawa Paris;
set Zlecenie[2] := Berlin Praha;
set Zlecenie[3] := Praha Amsterdam;
#number of packages for transport for a particular order
param Ilosc_Wyrobow :=
1 10
2 50
3 110;
param Godziny_Pracy := 9; #number of working hours during the day
param Pojemnosc_Samochodu := 35; #capacity of the car (how many packages it can take)
param Srednia_Predkosc := 80; #average car speed
param Spalenie_Paliwa := 0.25; #fuel combustion
param Wynagrodzenie_za_Godzine := 20; #salary for one working hour
param Cena_Noclegu := 100; #price of accommodation
param Dystans: '*some data: *' #km between cities.
param Koszt_Paliwa : '*some data: *' #fuel consumption depends on country.
end;
Code (.mod file):
#INDEXY
#=====================================================================
set Miasta; #i,j
set numery_Zlecen; #z
set Zlecenie{numery_Zlecen} dimen 2; #p,q
set Rodzaj_Transportu; #k
#PARAMETRY
#=====================================================================
param Dystans {Miasta,Miasta};
param Ilosc_Wyrobow{numery_Zlecen};
param Godziny_Pracy >= 0;
param Pojemnosc_Samochodu {Rodzaj_Transportu}>= 0;
param Srednia_Predkosc >=0;
param Spalenie_Paliwa >=0;
param Koszt_Paliwa {Miasta,Miasta};
param Wynagrodzenie_za_Godzine >= 0;
param Cena_Noclegu >= 0;
#ZMIENE
#=====================================================================
var x{Miasta,Miasta,numery_Zlecen} <= 1, >= 0; #variable x equal 1 when we're going the path from city A to city B; otherwise it equals 0
var y{Rodzaj_Transportu,numery_Zlecen} binary <=1, >=0; #variable that shows what types of car/s we are using for order (can be 0 or 1)
var Koszty_Suma{Rodzaj_Transportu,numery_Zlecen}; #total costs
var Koszty_Transportu{numery_Zlecen}; #transport costs
var Koszty_Odpoczynku{numery_Zlecen}; #rest costs
var Koszty_Wynagrodzenia{numery_Zlecen}; #salary costs
#FUNKCjA CELU
#=====================================================================
minimize PATH_COST: sum{k in Rodzaj_Transportu}(sum{z in numery_Zlecen}Koszty_Suma[k,z])*y[k,z];
#OGRANICZENIA (constraints)
#=====================================================================
s.t. SOURCE{z in numery_Zlecen, (p,q) in Zlecenie[z], i in Miasta: i = p && p != q}:
sum {j in Miasta} (x[i ,j ,z ]) - sum {j in Miasta}( x[j ,i ,z ]) = 1;
s.t. INTERNAL {z in numery_Zlecen, (p,q) in Zlecenie[z],i in Miasta: i != p && i != q && p != q }:
sum {j in Miasta} (x[i ,j ,z ]) - sum {j in Miasta}( x[j ,i ,z ]) = 0;
s.t. OGR_KM_DZIEN{z in numery_Zlecen,(p,q) in Zlecenie[z], j in Miasta, i in Miasta: i != q}:
if (Dystans[i,j] > (Godziny_Pracy*Srednia_Predkosc)) and i != q then x[i,j,z] = 0;
s.t. OGR_KOSZTY_SUMA{z in numery_Zlecen, k in Rodzaj_Transportu}:
Koszty_Suma[k,z] = (Koszty_Transportu[z] + Koszty_Odpoczynku[z] + Koszty_Wynagrodzenia[z])*ceil(Ilosc_Wyrobow[z]/Pojemnosc_Samochodu[k]);
s.t. OGR_KOSZTY_TRANSPORTU{z in numery_Zlecen}:
Koszty_Transportu[z] = (sum{i in Miasta} (sum{j in Miasta} ( Dystans[i,j]*x[i,j, z]*Koszt_Paliwa[i,j] ) ))*Spalenie_Paliwa;
s.t. OGR_KOSZTY_ODPOCZYNKU{z in numery_Zlecen}:
Koszty_Odpoczynku[z] =
(sum{i in Miasta} (sum{j in Miasta} ( Dystans[i,j]*x[i,j, z] ) ))/(Godziny_Pracy*Srednia_Predkosc) * Cena_Noclegu;
s.t. OGR_KOSZTY_WYNAGRODZENIA{z in numery_Zlecen}:
Koszty_Wynagrodzenia[z] =
((sum{i in Miasta} (sum{j in Miasta} ( Dystans[i,j]*x[i,j, z] ) ))/(Srednia_Predkosc)) * Wynagrodzenie_za_Godzine;
s.t. OGR_Y_JEDEN{z in numery_Zlecen}:
sum{k in Rodzaj_Transportu}(y[k,z]) = 1;
solve;
How is it possible to get rid of this error? Any hints how to solve this kind of problem are welcome.
First I think the parentheses are incorrect (note that y[k,z] depends on z). The expression
sum{k in Rodzaj_Transportu}(sum{z in numery_Zlecen}Koszty_Suma[k,z])*y[k,z];
is not mathematically correct. So, I assume what you meant is:
sum{k in Rodzaj_Transportu}(sum{z in numery_Zlecen}Koszty_Suma[k,z]*y[k,z]);
Let me restate the problem a little bit. I assume we can write this as:
sum((i,j), x[i,j]*y[i,j])
with y a binary variable and x a continuous variable. I also assume 0 <= x[i,j] <= U[i,j]. (U is an upper bound).
Here is a way to linearize this quadratic term. We can introduce a variable z[i,j]=x[i,j]*y[i,j] using the following inequalities:
z[i,j] <= U[i,j]*y[i,j]
z[i,j] <= x[i,j]
z[i,j] >= x[i,j]-U[i,j]*(1-y[i,j])
0 <= z[i,j] <= U[i,j]
Now you just can minimize sum((i,j),z[i,j]). For a similar linearization see link.

How to calculate performance curve for each row of data

I want to plot a performance curve for each row of data I have.
A simple version of what I want to do is plot the function with the equation as Y= m*X+b, where I have a table with m and b values and I want Y values for X = 1 to 10.
How is this calculated?
A Y = mX + b example can be seen in the following plot:
The following works:
WITH NUMBERS AS
(
SELECT N FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10))N(N)
),
Examples AS
(
SELECT m,b FROM (VALUES (1,2),(2,2))N(m,b)
)
SELECT
'Y = ' + CAST(Examples.m as varchar(10)) + 'X + ' + CAST(Examples.b as varchar(10)) AS Formula
,Numbers.N AS X
, Numbers.N * Examples.m + Examples.b
FROM Examples
CROSS JOIN NUMBERS

pyspark mathematical computation in a dataframe

I have extracted a Dataframe from a larger Dataframe, and now I need to do simple computation like addition and division in dataframe.
sample dataframe is like.
item counts
z 23156
x 15462
What I need to do is to divide x by sum of x and z
for example
value= x/x+z
You must compute the sum of x and first then divide x by sum(x) + sum(y)
for example:
Table 1(original table):
x z
1 2
3 4
Table 2 (Aggregated table):
table2 = sqlCtx.sql("select sum(x) + sum(z) as sum_xz")
table2.registerTempTable("table2")
sum_xz
10
Then join both table and divide
table3 = sqlCtx.sql("select a.x / bs.um_xz from table1 a join table2 b")
For your reference.

Error:'no variables defined' in stata when using monte carlo simulation

I have written the program below and keep getting the error message that my variables are not defined.
Can somebody plese see where the error is and how I should adapt the code? Really nothing seems to work.
program define myreg, rclass
drop all
set obs 200
gen x= 2*uniform()
gen z = rnormal(0,1)
gen e = (invnorm(uniform()))^2
e=e-r(mean)
replace e=e-r(mean)
more
gen y = 1 + 1*x +1*z + 1*e
reg y x z
e=e-r(mean)
replace e=e-r(mean)
more
gen y = 1 + 1*x +1*z + 1*e
reg y x z
more
return scalar b0 =_[_cons]
return scalar b1=_[x]
return scalar b2 =_[z]
more
end
simulate b_0 = r(b0) b_1 = r(b1) b_2 = r(b2), rep(1000): myreg
*A possible solution with eclass
capture program drop myreg
program define myreg, eclass
* create an empty data by dropping all variables
drop _all
set obs 200
gen x= 2*uniform()
gen z = rnormal(0,1)
gen e = (invnorm(uniform()))^2
qui sum e /*to get r(mean) you need to run sum first*/
replace e=e-r(mean)
gen y = 1 + 1*x +1*z + 1*e
reg y x z
end
*gather the coefficients (_b) and standard errors (_se) from the *regression each time
simulate _b _se, reps(1000) seed (123): myreg
* show the final result
mat list r(table)
* A possible solution with rclass
* To understand the difference between rclass and eclass, see the Stata manual(http://www.stata.com/manuals13/rstoredresults.pdf)
capture program drop myreg
program define myreg, rclass
drop _all
set obs 200
gen x= 2*uniform()
gen z = rnormal(0,1)
gen e = (invnorm(uniform()))^2
qui sum e
replace e=e-r(mean)
gen y = 1 + 1*x +1*z + 1*e
reg y x z
mat output=e(b)
return scalar b0=output[1,3]
return scalar b1=output[1,1]
return scalar b2=output[1,2]
end
simulate b_0=r(b0) b_1=r(b1) b_2=r(b2), rep(1000) seed (123): myreg
return list
*P.S. You should read all the comments as suggested by #Nick to fully understand what I did here. .