R use apply function on xts zoo class - apply

I am new in R and I try to use apply function on the xts zoo class, however it shows error. I have a formula: ((2*Close-High-Low)/(High-Low)) * Volume
Input:
y <- getSymbols("0005.HK", auto.assign = FALSE, src = "yahoo")
Error:
y$II <- apply(y,2,function(x) (2Cl(x) - Hi(x) - Lo(x)) / ((Hi(x) - Lo(x)) * Vo(stk)))
Error: unexpected symbol in "apply(y,2,function(x) (2Cl"
and then I tried another one:
Error:
y$II <- apply(y,2,function(x) (2(x[,4]) - x[,2] - x[,3]) / (x[,2] - x[,3]) * x[,5])
Error in FUN(newX[, i], ...) : attempt to apply non-function
After that, I would like to sum the y$II 21 days but I don't know how to do apply function to sum 21 days between every 21 days
IIstd = Sum of 21 ((2*C-H-L)/(H-L)) * V
IInorm = (IIstd / Sum 21 day V) * 100
Anyone can help me ? Please advice, thanks.

There are two problems here:
2Cl(x) i s not valid R -- use 2 * Cl(x)
all operations on the right hand side are already vectorized so we do not need apply in the first place
For clarity here we have assumed that II = (2C - H - L)/((H-L) * V)and you want 100 times the 21 period volume weighted moving average of that. Modify if that is not what you want.
Try this:
y$II <- (2*Cl(y) - Hi(y) - Lo(y)) / ((Hi(y) - Lo(y)) * Vo(y))
Regarding the second part of the question try this -- rollapplyr is in the zoo package.
wmean <- function(x) weighted.mean(x$II, Vo(x))
y$MeanII <- 100 * rollapplyr(y, 21, wmean, by.column = FALSE, fill = NA)
Also check out the TTR package.
UPDATE: Added answer to second part of question.

Related

C++ find where a point set lies between two others

I have three sets of 2d points. What i need to do is to find out where one sits in relation to the other two.
Every set has the same points, in the same order. One is 'neutral', one is 'max', and the third is unknown. What I need is to return a single value, between 0 and 1, that illustrates the amount that the unknown set is between the other two.
For example, in the image:
I would somehow get the 'distance' or 'weight' between Set A and Set B, then find out where Set C sits between them. In this example, i would expect a value of around 75%, or 0.75.
I have looked at using point set registration algorithms that return a scale amount to match Set C to Set B, but i am not convinced that this is the best way. What approach would be suitable for this problem? What algorithms should I be searching for?
You could try to solve this with a simple linear interpolation between the two sets. This works if the transition between the sets is indeed nearly linear. If you know that it is something else, you can adapt the interpolation function.
Let us focus on a single point p. We know its coordinates in all sets p_A, p_B, and p_C. Then, we specify that p_C is more or less a linear interpolation between p_A and p_B with parameter t (where t=0 represents set A and t=1 represents set B):
p_C = (1 - t) * p_A + t * p_B
= p_A - t * p_A + t * p_B
= p_A + t * (p_B - p_A)
p_C - p_A = t * (p_B - p_A)
The question now is to find a t that approximately holds for all your points.
We can solve this by stating the problem as a linear least squares problem. I.e. we want to minimize the summed residuals (difference between left-hand sides and right-hand sides of the above equation) for all points:
arg min_t Σ_i (pi_C.x - pi_A.x - t * (pi_B.x - pi_A.x))^2
+ (pi_C.y - pi_A.y - t * (pi_B.y - pi_A.y))^2
The optimal t is then:
numX = Σ_i (pi_A.x^2 - pi_A.x * pi_B.x - pi_A.x * pi_C.x + pi_B.x * pi_C.x)
numY = Σ_i (pi_A.y^2 - pi_A.y * pi_B.y - pi_A.y * pi_C.y + pi_B.y * pi_C.y)
denX = Σ_i (pi_A.x^2 - 2 * pi_A.x * pi_B.x + pi_B.x^2)
denY = Σ_i (pi_A.y^2 - 2 * pi_A.y * pi_B.y + pi_B.y^2)
t = (numX + numY) / (denX + denY)
If your points have higher dimension, just add the new dimension with the same pattern.

Find the nearest match that could add up to zero in SAS

I am using SAS for research. My question is how to find the nearest match in the same column. Please see the following for a quick illustration. I am new to SAS programming, and only have a preliminary guess that proc sql might do the work. What I am doing now is manually adjusting - it is painful and especially so for over 3,000 observations.
I want to find the nearest "Value" match that could add up to zero. For example, for firm AA in 1st quarter 2000, I want to match the nearest two numbers that could add up to 100. I don't want the 50 for firm AA in 2002Q2 nor firm BB 2000Q4. In addition, I also struggle with the case for firm BB, and have no idea how to perform the matching: the two negative numbers add up to -200, the two positive numbers add up to +200, and they maybe in same or different years. To help you understand better, please find the following table for what I have in mind at the end of the day:
For the BB case, it can be 2001Q3 "-100" matched to "50" in 2000Q4, it is also fine if it matches to "100" in 2001Q1 - the order doesn't matter. Thanks in advance! Any help is really appreciated!
Regards,
Michael
At +/- 2 quarters, each row has at most 5 items that need to be to checked in combination.
There are 15 combinations that include the current row (0 column) and at least one other row.
combo -2 -1 0 1 2
1 * * *
2 * *
3 * *
4 * * * *
5 * * *
6 * * *
7 * *
8 * * * * *
9 * * * *
10 * * * *
11 * * *
12 * * * *
13 * * *
14 * * *
15 * *
You could check all these combinations for each row to find your cases 'of sums to zero' in the context of combinations with replacement.

Strange iteration results "error is nan" and RuntimeWarning using t-SNE

I am using t-SNE python implementation for dimensionality reduction on X which contains 100 instances each described by 1024 parameters for cnn visualization.
X.shape = [100, 1024]
X.dtype = float32
When I run :
Y = tsne.tsne(X)
The first warning pops out in tsne.py, line 23 :
RuntimeWarning: divide by zero encountered in log
H = Math.log(sumP) + beta * Math.sum(D * P) / sumP
Then there is a couple more warnings like this one on the following lines :
RuntimeWarning: invalid value encountered in divide
And finally I get this result after each iteration during the processing :
Iteration xyz : error is nan
The code ends without "errors" and I get an empty scatter plot at the end.
EDIT:
-> I have tried it with a different data set and it worked perfectly. However I would need it to work on my first set as well (the one that seems to cause problems)
Question :
Does anyone know what might be causing this? Is there a workaround?
sumP = sum(P)+np.finfo(np.double).eps
H = np.log(sumP) + beta * np.sum(D * P) / sumP;
This should fix the problem

Solving for polynomial roots in Stata

I am trying to solve for the roots of a function in Stata. There is the "polyeval" command under Mata, but I am not sure how to apply it here. It seems to me as if under polyeval functions must follow a very clear structure of x^2 + x + c.
I would like to find out more about how to use Stata to solve this type of problem in general. But here is my current one, if that provides some idea of what I am working with.
I am currently trying to solve the Black (1976) American Options pricing model:
C = e^{-rt} [ F N(d1) - E N(d2)]
where,
d1 = [ln(F/E) + 1/2 simga^2 t] / [sigma sqrt{t}]
d2 = d1 - sigma sqrt{t}
where C is the price of call option, t is time to expiration, r is interest rate, F is current futures price of contract, E is strike price, sigma is the annualized standard deviation of the futures contract. N(d1) and N(d2) are cumulative normal probability functions. All variables are known except for sigma.
As an aside, this seems to be really easy to do in R:
fun <- function(sigma) exp(-int.rate* T) * (futures * pnorm((log(futures/Strike)+ sigma^2 * T/2) / sigma * sqrt(T),0,1)- Strike * pnorm((log(futures/Strike)+ sigma^2 * T/2) / sigma * sqrt(T)- sigma * sqrt(T),0,1) ) - Option
uni <- uniroot(fun, c(0, 1), tol = 0.001 )
uni$root
Does anyone have any ideas/pointers on how to use Stata to solve this type of function?

Calculate a difference in stata with if command

I want to calculate something like
by group: egen x if y==1 - x if y==2
Of course this is not a real stata code but I'm kind of lost. In R this is simply passed by a "[]" behind the variable of intrest but I'm not sure about stata
R would be
x[y==1] - x[y==2]
I would use reshape.
clear
version 11.2
set seed 2001
* generate data
set obs 100
generate y = 1 + mod(_n - 1, 2)
generate x = rnormal()
generate group = 1 + floor((_n - 1) / 2)
list in 1/10
* reshape to wide and difference
reshape wide x, i(group) j(y)
generate x_diff = x1 - x2
list in 1/5
I would use reshape in R, also. Otherwise can you be sure that everything is properly ordered to give you the difference you want?
There is likely a neat Mata solution, but I know very little Mata. You may find preserve and restore helpful if you're averse to reshapeing.
Richard Herron makes a good point that a reshape to a different structure might be worthwhile. Here I focus on how to do it with the existing structure.
Assuming that there are precisely two observations for each group of group, one with y == 1 and one with y == 2, then
bysort group (y) : gen diff = x[1] - x[2]
gives the difference between values of x, necessarily repeated for each observation of two in a group. An assumption-free method is
bysort group: egen mean_1 = mean(x / (y == 1))
by group: egen mean_2 = mean(x / (y == 2))
gen diff = mean_1 - mean_2
Consider expressions such as x / (y == 1). Here the denominator y == 1 is 1 when y is indeed 1 and 0 otherwise. Division by 0 yields missing in Stata, but the egen command here ignores those. So the first command of the three commands above yields the mean of x for observations for which y == 1 and the second the mean of x for observations for which y == 2. Other values of y (even missings) will be ignored. This method should agree with the first method when the first method is valid.
For a review of similar problems, see http://stata-journal.com/article.html?article=dm0055
In Stata the if referred to here is a qualifier (not a command).