I have two data frames namely shares and Log_Returns. I want to multiply first digit of shares with 1st column of Log_Returns. 2nd digit of shares with 2nd column and so on.
shares
0
0 0.319297
1 99.680703
Log_return
HIGH MID
0 -1.998061 -1.991331
1 -0.014573 -1.981635
2 -2.015117 -1.978619
3 0.028488 -2.000455
I have tried the following code but got the output in series.
for j in range(Log_return.shape[1]):
i = j
for k in range(len(Log_return)):
print(shares.iloc[i, 0] * Log_return.iloc[k, j])
-0.6379746350243671
-0.004652966683445539
-0.6434205307519527
0.009096189190505598
-198.49724749816215
-197.53080296760479
-197.23014837824016
-199.40677745293362
I want someone to help me for finding this results.
Log_return
HIGH MID
B -0.637975 -198.497247
C -0.004653 -197.530803
D -0.643421 -197.230148
E 0.009096 -199.406777
using numpy:
import numpy as np
np.multiply(log_return, shares.T)
should give the desired output.
Related
Goal: perform rolling window calculations on panel data in Stata with variables PanelVar, TimeVar, and Var1, where the window can change within a loop over different window sizes.
Problem: no access to SSC for the packages that would take care of this (like rangestat)
I know that
by PanelVar: gen Var1_1 = Var1[_n]
produces a copy of Var1 in Var1_1. So I thought it would make sense to try
by PanelVar: gen Var1SumLag = sum(Var1[(_n-3)/_n])
to produce a rolling window calculation for _n-3 to _n for the whole variable. But it fails to produce the results I want, it just produces zeros.
You could use sum(Var1) - sum(Var1[_n-3]), but I also want to be able to make the rolling window left justified (summing future observations) as well as right justified (summing past observations).
Essentially I would like to replicate Python's ".rolling().agg()" functionality.
In Stata _n is the index of the current observation. The expression (_n - 3) / _n yields -2 when _n is 1 and increases slowly with _n but is always less than 1. As a subscript applied to extract values from observations of a variable it always yields missing values given an extra rule that Stata rounds down expressions so supplied. Hence it reduces to -2, -1 or 0: in each case it yields missing values when given as a subscript. Experiment will show you that given any numeric variable say numvar references to numvar[-2] or numvar[-1] or numvar[0] all yield missing values. Otherwise put, you seem to be hoping that the / yields a set of subscripts that return a sequence you can sum over, but that is a long way from what Stata will do in that context: the / is just interpreted as division. (The running sum of missings is always returned as 0, which is an expression of missings being ignored in that calculation: just as 2 + 3 + . + 4 is returned as 9 so also . + . + . + . is returned as 0.)
A fairly general way to do what you want is to use time series operators, and this is strongly preferable to subscripts as (1) doing the right thing with gaps (2) automatically working for panels too. Thus after a tsset or xtset
L0.numvar + L1.numvar + L2.numvar + L3.numvar
yields the sum of the current value and the three previous and
L0.numvar + F1.numvar + F2.numvar + F3.numvar
yields the sum of the current value and the three next. If any of these terms is missing, the sum will be too; a work-around for that is to return say
cond(missing(L3.numvar), 0, L3.numvar)
More general code will require some kind of loop.
Given a desire to loop over lags (negative) and leads (positive) some code might look like this, given a range of subscripts as local macros i <= j
* example i and j
local i = -3
local j = 0
gen double wanted = 0
forval k = `i'/`j' {
if `k' < 0 {
local k1 = -(`k')
replace wanted = wanted + L`k1'.numvar
}
else replace wanted = wanted + F`k'.numvar
}
Alternatively, use Mata.
EDIT There's a simpler method, to use tssmooth ma to get moving averages and then multiply up by the number of terms.
tssmooth ma wanted1=numvar, w(3 1)
tssmooth ma wanted2=numvar, w(0 1 3)
replace wanted1 = 4 * wanted1
replace wanted2 = 4 * wanted2
Note that in contrast to the method above tssmooth ma uses whatever is available at the beginning and end of each panel. So, the first moving average, the average of the first value and the three previous, is returned as just the first value at the beginning of each panel (when the three previous values are unknown).
I am trying to loop through my data and for every time a threshold is exceeded, i want to rasie a flag and count it. At the end i want an output of a data frame having those rows that were flagged and their corresponding information
I have gotten this far..
frame_of_reference = frame_of_reference.apply(pd.to_numeric, errors='coerce')
window_size = 10
for i in range(1, len(frame_of_reference['Number_of_frames']), window_size):
events_ttc = [1 if 0.0 < frame_of_reference['TTC_radar'].any() <= 1.5 else 0]
events_ttc
but instead of giving me a dataframe, it only gives me a one or a zero.
Screenshot:
I need to count the number of items in my output.
So for example i created this:
a =1000000
while a >=10:
print a
a=a/2
How would i count how many halving steps were carried out?
Thanks
You have 2 ways: the empiric way and the predictible way.
a =1000000
import math
print("theorical iterations {}".format(int(math.log2(a//10)+0.5)))
counter=0
while a >=10:
counter+=1
a//=2
print("real iterations {}".format(counter))
I get:
theorical iterations 17
real iterations 17
The experimental method just counts the iterations, whereas the predictive method relies on the rounded (to upper bound) result of log2 value of a (which matches the complexity of the algorithm).
(It's rounded to upper bound because if it's more than 16, then you need 17 iterations)
c = 0
a = 1000000
while a >= 10:
print a
a = a / 2
c = c + 1
I have this program that is supposed to search for perfect numbers.
(X is a perfect number if the sum of all numbers that divide X, divided by 2 is equal to X)
sum/2 = x
Now It has found the first four, which were known in Ancient Greece, so it's not really a anything awesome.
The next one should be 33550336.
I know it is a big number, but the program has been going for about 50 minutes, and still hasn't found 33550336.
Is it because I opened the .txt file where I store all the perfect numbers while the program was running, or is it because I don't have a PC fast enough to run it*, or because I'm using Python?
*NOTE: This same PC factorized 500 000 in 10 minutes (while also running the perfect number program and Google Chrome with 3 YouTube tabs), also using Python.
Here is the code to the program:
i = 2
a = open("perfect.txt", 'w')
a.close()
while True:
sum = 0
for x in range(1, i+1):
if i%x == 0:
sum += x
if sum / 2 == i:
a = open("perfect.txt", 'a')
a.write(str(i) + "\n")
a.close()
i += 1
The next one should be 33550336.
Your code (I fixed the indentation so that it does in principle what you want):
i = 2
a = open("perfect.txt", 'w')
a.close()
while True:
sum = 0
for x in range(1, i+1):
if i%x == 0:
sum += x
if sum / 2 == i:
a = open("perfect.txt", 'a')
a.write(str(i) + "\n")
a.close()
i += 1
does i divisions to find the divisors of i.
So to find the perfect numbers up to n, it does
2 + 3 + 4 + ... + (n-1) + n = n*(n+1)/2 - 1
divisions in the for loop.
Now, for n = 33550336, that would be
Prelude> 33550336 * (33550336 + 1) `quot` 2 - 1
562812539631615
roughly 5.6 * 1014 divisions.
Assuming your CPU could do 109 divisions per second (it most likely can't, 108 is a better estimate in my experience, but even that is for machine ints in C), that would take about 560,000 seconds. One day has 86400 seconds, so that would be roughly six and a half days (more than two months with the 108 estimate).
Your algorithm is just too slow to reach that in reasonable time.
If you don't want to use number-theory (even perfect numbers have a very simple structure, and if there are any odd perfect numbers, those are necessarily huge), you can still do better by dividing only up to the square root to find the divisors,
i = 2
a = open("perfect.txt", 'w')
a.close()
while True:
sum = 1
root = int(i**0.5)
for x in range(2, root+1):
if i%x == 0:
sum += x + i/x
if i == root*root:
sum -= x # if i is a square, we have counted the square root twice
if sum == i:
a = open("perfect.txt", 'a')
a.write(str(i) + "\n")
a.close()
i += 1
that only needs about 1.3 * 1011 divisions and should find the fifth perfect number in a couple of hours.
Without resorting to the explicit formula for even perfect numbers (2^(p-1) * (2^p - 1) for primes p such that 2^p - 1 is prime), you can speed it up somewhat by finding the prime factorisation of i and computing the divisor sum from that. That will make the test faster for all composite numbers, and much faster for most,
def factorisation(n):
facts = []
multiplicity = 0
while n%2 == 0:
multiplicity += 1
n = n // 2
if multiplicity > 0:
facts.append((2,multiplicity))
d = 3
while d*d <= n:
if n % d == 0:
multiplicity = 0
while n % d == 0:
multiplicity += 1
n = n // d
facts.append((d,multiplicity))
d += 2
if n > 1:
facts.append((n,1))
return facts
def divisorSum(n):
f = factorisation(n)
sum = 1
for (p,e) in f:
sum *= (p**(e+1) - 1)/(p-1)
return sum
def isPerfect(n):
return divisorSum(n) == 2*n
i = 2
count = 0
out = 10000
while count < 5:
if isPerfect(i):
print i
count += 1
if i == out:
print "At",i
out *= 5
i += 1
would take an estimated 40 minutes on my machine.
Not a bad estimate:
$ time python fastperf.py
6
28
496
8128
33550336
real 36m4.595s
user 36m2.001s
sys 0m0.453s
It is very hard to try and deduce why this has happened. I would suggest that you run your program either under a debugger and test several iteration manually to check if the code is really correct (I know you have already calculated 4 numbers but still). Alternatively it would be good to run your program under a python profiler just to see if it hasn't accidentally blocked on a lock or something.
It is possible, but not likely that this is an issue related to you opening the file while it is running. If it was an issue, there would have probably been some error message and/or program close/crash.
I would edit the program to write a log-type output to a file every so often. For example, everytime you have processed a target number that is an even multiple of 1-Million, write (open-append-close) the date-time and current-number and last-success-number to a log file.
You could then Type the file once in a while to measure progress.
Ok guys, as requested, I will add more info so that you understand why a simple vector operation is not possible. It's not easy to explain in few words but let's see. I have a huge amount of points over a 2D space.
I divide my space in a grid with a given resolution,say, 100m. The main loop that I am not sure if it's mandatory or not (any alternative is welcomed) is to go through EACH cell/pixel that contains at least 2 points (right now I am using the method quadratcount within the package spatstat).
Inside this loop, thus for each one of this non empty cells, I have to find and keep only a maximum of 10 Male-Female pairs that are within 3 meters from each other. The 3-meter buffer can be done using the "disc" function within spatstat. To select points falling inside a buffer you can use the method pnt.in.poly within the SDMTools package. All that because pixels have a maximum capacity that cannot be exceeded. Since in each cell there can be hundreds or thousands of points I am trying to find a smart way to use another loop/similar method to:
1)go trough each point at a time 2)create a buffer a select points with different sex 3)Save the closest Male-Female (0-1) pair in another dataframe (called new_colonies) 4)Remove those points from the dataframe so that it shrinks and I don't have to consider them anymore 5) as soon as that new dataframe reaches 10 rows stop everything and go to the next cell (thus skipping all remaining points. Here is the code that I developed to be run within each cell (right now it takes too long):
head(df,20):
X Y Sex ID
2 583058.2 2882774 1 1
3 582915.6 2883378 0 2
4 582592.8 2883297 1 3
5 582793.0 2883410 1 4
6 582925.7 2883397 1 5
7 582934.2 2883277 0 6
8 582874.7 2883336 0 7
9 583135.9 2882773 1 8
10 582955.5 2883306 1 9
11 583090.2 2883331 0 10
12 582855.3 2883358 1 11
13 582908.9 2883035 1 12
14 582608.8 2883715 0 13
15 582946.7 2883488 1 14
16 582749.8 2883062 0 15
17 582906.4 2883317 0 16
18 582598.9 2883390 0 17
19 582890.2 2883413 0 18
20 582752.8 2883361 0 19
21 582953.1 2883230 1 20
Inside each cell I must run something according to what I explained above..
for(i in 1:dim(df)[1]){
new_colonies <- data.frame(ID1=0,ID2=0,X=0,Y=0)
discbuff <- disc(radius, centre=c(df$X[i], df$Y[i]))
#define the points and polygon
pnts = cbind(df$X[-i],df$Y[-i])
polypnts = cbind(x = discbuff$bdry[[1]]$x, y = discbuff$bdry[[1]]$y)
out = pnt.in.poly(pnts,polypnts)
out$ID <- df$ID[-i]
if (any(out$pip == 1)) {
pnt.inBuffID <- out$ID[which(out$pip == 1)]
cond <- df$Sex[i] != df$Sex[pnt.inBuffID]
if (any(cond)){
eucdist <- sqrt((df$X[i] - df$X[pnt.inBuffID][cond])^2 + (df$Y[i] - df$Y[pnt.inBuffID][cond])^2)
IDvect <- pnt.inBuffID[cond]
new_colonies_temp <- data.frame(ID1=df$ID[i], ID2=IDvect[which(eucdist==min(eucdist))],
X=(df$X[i] + df$X[pnt.inBuffID][cond][which(eucdist==min(eucdist))]) / 2,
Y=(df$Y[i] + df$Y[pnt.inBuffID][cond][which(eucdist==min(eucdist))]) / 2)
new_colonies <- rbind(new_colonies,new_colonies_temp)
if (dim(new_colonies)[1] == maxdensity) break
}
}
}
new_colonies <- new_colonies[-1,]
Any help appreciated!
Thanks
Francesco
In your case I wouldn't worry about deleting the points as you go, skipping is the critical thing. I also wouldn't make up a new data.frame piece by piece like you seem to be doing. Both of those things slow you down a lot. Having a selection vector is much more efficient (perhaps part of the data.frame, that you set to FALSE beforehand).
df$sel <- FALSE
Now, when you go through you set df$sel to TRUE for each item you want to keep. Just skip to the next cell when you find your 10. Deleting values as you go will be time consuming and memory intensive, as will slowly growing a new data.frame. When you're all done going through them then you can just select your data based on the selection column.
df <- df[ df$sel, ]
(or maybe make a copy of the data.frame at that point)
You also might want to use the dist function to calculate a matrix of distances.
from ?dist
"This function computes and returns the distance matrix computed by using the specified distance measure to compute the distances between the rows of a data matrix."
I'm assuming you are doing something sufficiently complicated that the for-loop is actually required...
So here's one rather simple approach: first just gather the rows to delete (or keep), and then delete the rows afterwards. Typically this will be much faster too since you don't modify the data.frame on each loop iteration.
df <- generateTheDataFrame()
keepRows <- rep(TRUE, nrow(df))
for(i in seq_len(nrow(df))) {
rows <- findRowsToDelete(df, df[i,])
keepRows[rows] <- FALSE
}
# Delete afterwards
df <- df[keepRows, ]
...and if you really need to work on the shrunk data in each iteration, just change the for-loop part to:
for(i in seq_len(nrow(df))) {
if (keepRows[i]) {
rows <- findRowsToDelete(df[keepRows, ], df[i,])
keepRows[rows] <- FALSE
}
}
I'm not exactly clear on why you're looping. If you could describe what kind of conditions you're checking there might be a nice vectorized way of doing it.
However as a very simple fix have you considered looping through the dataframe backwards?