how can i improve bulk calculation from file data - python-2.7

I have a file of binary values. The section I am looking at is 4 byte int with the values in the pattern of MW1, MVAR1, MW2, MVAR2,...
I read the values in with
temp = array.array("f")
temp.fromfile(file, length *2)
mw_mvar = temp.tolist()
I then calculate the magnitude like this.
mag = [0] * length
for x in range(0,length * 2, 2):
a = mw_mvar[x]
b = mw_mvar[x + 1]
mag[(x / 2)] = sqrt(a*a + b*b)
The calculations (not the read) are doubling the total length of my script. I know there is (theoretically) a way to do this faster because am mimicking a script that ultimately calls fortran (pyd to call function dlls in fortran i think) which is able to do this calculation with negligible affect on run time.
This is the best i can come up with. any suggestions for improvements?
I have also tried math.pow(), **.5, **2 with no differences.

with no luck improving the calculations, I went around the problem. I realised that I only needed 1% of those calculated values so I created a class to calculate them on demand. It was important (to me) that the resulting code act similar to as if it were a list of calculated values. A lot of the remainder of the process uses the values and different versions of the data are pre-calculated. The class means i don't need a set of procedures for each version of data
class mag:
def __init__(self,mw_mvar):
self._mw_mvar = mw_mvar
#_sgn = sgn
def __len__(self):
return len(self._mw_mvar/2)
def __getitem__(self, item):
return sqrt(self._mw_mvar[2*item] ** 2 + self._mw_mvar[2*item+1] ** 2)
ps this could also be done in a function and take both versions. i would have had to make more changes to the overall script.
function (a,b,x):
if b[x]==0:
return a[x]
else:
return sqrt(a[x]**2 + b[x]**2)

Related

thinkscript if statement failure

The thinkscript if statement fails to branch as expected in some cases. The following test case can be used to reproduce this bug / defect.
It is shared via Grid containing chart and script
To cut the long story short, a possible workaround in some cases is to use the if-expression which is a function, which may be slower, potentially leading to Script execution timeout in scans.
This fairly nasty bug in thinkscript prevents me from writing some scans and studies the way I need to.
Following is some sample code that shows the problem on a chart.
input price = close;
input smoothPeriods = 20;
def output = Average(price, smoothPeriods);
# Get the current offset from the right edge from BarNumber()
# BarNumber(): The current bar number. On a chart, we can see that the number increases
# from left 1 to number of bars e.g. 140 at the right edge.
def barNumber = BarNumber();
def barCount = HighestAll(barNumber);
# rightOffset: 0 at the right edge, i.e. at the rightmost bar,
# increasing from right to left.
def rightOffset = barCount - barNumber;
# Prepare a lookup table:
def lookup;
if (barNumber == 1) {
lookup = -1;
} else {
lookup = 53;
}
# This script gets the minimum value from data in the offset range between startIndex
# and endIndex. It serves as a functional but not direct replacement for the
# GetMinValueOffset function where a dynamic range is required. Expect it to be slow.
script getMinValueBetween {
input data = low;
input startIndex = 0;
input endIndex = 0;
plot minValue = fold index = startIndex to endIndex with minRunning = Double.POSITIVE_INFINITY do Min(GetValue(data, index), minRunning);
}
# Call this only once at the last bar.
script buildValue {
input lookup = close;
input offsetLast = 0;
# Do an indirect lookup
def lookupPosn = 23;
def indirectLookupPosn = GetValue(lookup, lookupPosn);
# lowAtIndirectLookupPosn is assigned incorrectly. The if statement APPEARS to be executed
# as if indirectLookupPosn was 0 but indirectLookupPosn is NOT 0 so the condition
# for the first branch should be met!
def lowAtIndirectLookupPosn;
if (indirectLookupPosn > offsetLast) {
lowAtIndirectLookupPosn = getMinValueBetween(low, offsetLast, indirectLookupPosn);
} else {
lowAtIndirectLookupPosn = close[offsetLast];
}
plot testResult = lowAtIndirectLookupPosn;
}
plot debugLower;
if (rightOffset == 0) {
debugLower = buildValue(lookup);
} else {
debugLower = 0;
}
declare lower;
To prepare the chart for the stock ADT, please set custom time frame:
10/09/18 to 10/09/19, aggregation period 1 day.
The aim of the script is to find the low value of 4.25 on 08/14/2019.
I DO know that there are various methods to do this in thinkscript such as GetMinValueOffset().
Let us please not discuss alternative methods of achieving the objective to find the low, alternatives for the attached script.
Because I am not asking for help achieving the objective. I am reporting a bug, and I want to know what goes wrong and perhaps how to fix it. In other words, finding the low here is just an example to make the script easier to follow. It could be anything else that one wants a script to compute.
Please let me describe the script.
First it does some smoothing with a moving average. The result is:
def output;
Then the script defines the distance from the right edge so we can work with offsets:
def rightOffset;
Then the script builds a lookup table:
def lookup;
script getMinValueBetween {} is a little function that finds the low between two offset positions, in a dynamic way. It is needed because GetMinValueOffset() does not accept dynamic parameters.
Then we have script buildValue {}
This is where the error occurs. This script is executed at the right edge.
buildValue {} does an indirect lookup as follows:
First it goes into lookup where it finds the value 53 at lookupPosn = 23.
With 53, if finds the low between offset 53 and 0, by calling the script function getMinValueBetween().
It stores the value in def lowAtIndirectLookupPosn;
As you can see, this is very simple indeed - only 38 lines of code!
The problem is, that lowAtIndirectLookupPosn contains the wrong value, as if the wrong branch of the if statement was executed.
plot testResult should put out the low 4.25. Instead it puts out close[offsetLast] which is 6.26.
Quite honestly, this is a disaster because it is impossible to predict which of any if statement in your program will fail or not.
In a limited number of cases, the if-expression can be used instead of the if statement. However the if-expression covers only a subset of use cases and it may execute with lower performance in scans. More importantly,
it defeats the purpose of the if statement in an important case because it supports conditional assignment but not conditional execution. In other words, it executes both branches before assigning one of two values.

Sparse index use for optimization with Pyomo model

I have a Pyomo model connected to a Django-created website.
My decision variable has 4 indices and I have a huge amount of constraints running on it.
Since Pyomo takes a ton of time to read in the constraints with so many variables, I want to sparse out the index set to only contain variables that actually could be 1 (i have some conditions on that)
I saw this post
Create a variable with sparse index in pyomo
and tried a for loop for all my conditions. I created a set "AllowedVariables" to later put this inside my constraints.
But Django's server takes so long to create this set while performing the system check, it never comes out.
Currently i have this model:
model = AbstractModel()
model.x = Var(model.K, model.L, model.F, model.Z, domain=Boolean)
def ObjRule(model):
# some rule, sense maximize
model.Obj = pyomo.environ.Objective(rule=ObjRule, sense=maximize)
def ARule(model,l):
maxA = sum(model.x[k,l,f,z] * for k in model.K for f in model.F
for z in model.Z and (k,l,f,z) in model.AllowedVariables)
return maxA <= 1
model.maxA = Constraint(model.L, rule=ARule)
The constraint is exemplary, I have 15 more similar ones. I currently create "AllowedVariables" this way:
AllowedVariables = []
for k in model.K:
for l in model.L:
..... check all sorts of conditions, break if not valid
AllowedVaraibles.append((k,l,f,z))
model.AllowedVariables = Set(initialize=AllowedVariables)
Using this, the Django server starts checking....and never stops
performing system checks...
Sadly, I somehow need some restriction on the variables or else the reading for the solver will take way to long since the constraints contain so many unnecessary variables that have to be 0 anyways.
Any ideas on how I can sparse my variable set?

Changing index reference for Set in Pyomo

Is there a way to change the Set indexing from 1 indexed to 0 indexed in Pyomo? It's very difficult to keep everything straight when you are dealing with multiple objects where Pyomo is 1 referenced and everything else from Python is 0 referenced.
The reason for this is to generate a model fitting routine for multiple circuit devices. Instead of recreating the entire model over and over, I want to define it once with an AbstractModel. Then I can just reload the data and resolve for each device.
In my objective function, I'm defining intermediate values using list comprehension. Once these intermediate values are generated, they are now 0 referenced. An example of what I'm doing is below. As you can see, I have to have some parameters declared with [i] and others with [i-1]. It just becomes difficult and confusing when the functions become large. It would make a whole lot more sense if everything was just 0 referenced so that it was consistent with standard Python code. I was hoping there was some easy option or setting to declare whether a Set is 0 or 1 referenced.
y11intre = [1 / m.Ra[1] + 1 / m.Rb[1] for i in m.n]
y11intim = [m.w[i] * (m.Ca[1] + m.Cb[1]) for i in m.n]
y12intre = ...
...
z11intre = [-y22intim[i-1] * ... for i in m.n]
...
z11re = [m.Rae[1] + z11intre[i-1] for i in m.n]
z11im = [m.w[i] * m.Lae[1] + z11intim[i-1] for i in m.n]
You can provide the starting and stopping point to RangeSet to give you the values you want:
m.r = RangeSet(0,5) # [0,1,2,3,4,5]
m.s = RangeSet(0,4) # [0,1,2,3,4]

What's slowing down this piece of python code?

I have been trying to implement the Stupid Backoff language model (the description is available here, though I believe the details are not relevant to the question).
The thing is, the code's working and producing the result that is expected, but works slower than I expected. I figured out the part that was slowing down everything is here (and NOT in the training part):
def compute_score(self, sentence):
length = len(sentence)
assert length <= self.n
if length == 1:
word = tuple(sentence)
return float(self.ngrams[length][word]) / self.total_words
else:
words = tuple(sentence[::-1])
count = self.ngrams[length][words]
if count == 0:
return self.alpha * self.compute_score(sentence[1:])
else:
return float(count) / self.ngrams[length - 1][words[:-1]]
def score(self, sentence):
""" Takes a list of strings as argument and returns the log-probability of the
sentence using your language model. Use whatever data you computed in train() here.
"""
output = 0.0
length = len(sentence)
for idx in range(length):
if idx < self.n - 1:
current_score = self.compute_score(sentence[:idx+1])
else:
current_score = self.compute_score(sentence[idx-self.n+1:idx+1])
output += math.log(current_score)
return output
self.ngrams is a nested dictionary that has n entries. Each of these entries is a dictionary of form (word_i, word_i-1, word_i-2.... word_i-n) : the count of this combination.
self.alpha is a constant that defines the penalty for going n-1.
self.n is the maximum length of that tuple that the program is looking for in the dictionary self.ngrams. It is set to 3 (though setting it to 2 or even 1 doesn't anything). It's weird because the Unigram and Bigram models work just fine in fractions of a second.
The answer that I am looking for is not a refactored version of my own code, but rather a tip which part of it is the most computationally expensive (so that I could figure out myself how to rewrite it and get the most educational profit from solving this problem).
Please, be patient, I am but a beginner (two months into the world of programming). Thanks.
UPD:
I timed the running time with the same data using time.time():
Unigram = 1.9
Bigram = 3.2
Stupid Backoff (n=2) = 15.3
Stupid Backoff (n=3) = 21.6
(It's on some bigger data than originally because of time.time's bad precision.)
If the sentence is very long, most of the code that's actually running is here:
def score(self, sentence):
for idx in range(len(sentence)): # should use xrange in Python 2!
self.compute_score(sentence[idx-self.n+1:idx+1])
def compute_score(self, sentence):
words = tuple(sentence[::-1])
count = self.ngrams[len(sentence)][words]
if count == 0:
self.compute_score(sentence[1:])
else:
self.ngrams[len(sentence) - 1][words[:-1]]
That's not meant to be working code--it just removes the unimportant parts.
The flow in the critical path is therefore:
For each word in the sentence:
Call compute_score() on that word plus the following 2. This creates a new list of length 3. You could avoid that with itertools.islice().
Construct a 3-tuple with the words reversed. This creates a new tuple. You could avoid that by passing the -1 step argument when making the slice outside this function.
Look up in self.ngrams, a nested dict, with the first key being a number (might be faster if this level were a list; there are only three keys anyway?), and the second being the tuple just created.
Recurse with the first word removed, i.e. make a new tuple (sentence[2], sentence[1]), or
Do another lookup in self.ngrams, implicitly creating another new tuple (words[:-1]).
In summary, I think the biggest problem you have is the repeated and nested creation and destruction of lists and tuples.

Python: How to create a function that uses its own output and uses an array of random generated numbers

Disclaimer: I am quite new to Python and programming as a whole.
I have been trying to create a function to generated random stock prices using the following:
New stock price = previous price + (previous price*(return + (volatility * random number)))
The return and volatility numbers are fixed. Also, I have generated the random numbers for N times.
The problem is how to create a function that has the output re-used again on itself as an input previous price.
Basically to have an array of NEW stock prices generated from this formula and the previous price variable is the output of the function on itself.
I have been trying to do this for a couple of days and I am sure I am not fully equipped to do it (given that I am a newbie) but ANY HELP would really really be more than appreciated...!!!
Please any help would be useful.
import random
initial_price = 10
return_daily = 0.12 / 252
vol_daily = 0.30 / (math.sqrt(252))
random_numbers = []
for i in range (5):
random_numbers.append(random.gauss(0,1))
def stock_prices(random_numbers):
prices = []
for i in range(0,len(random_numbers)):
calc = initial_price + (initial_price * (return_daily+(vol_daily*random_numbers[i])))
prices.append(calc)
return prices
You can't really use recursion here, because you don't have a break condition that ends the recursion. You could construct one by passing an additional counter parameter that specifies how many more levels to recurse, but that would be not optimal in my opinion.
Instead, I recommend you to use a for loop that gets repeated a fixed number of times you can specify. This way you can add one new price value to a list per loop iteration step and access the previous one to calculate it:
first_price = 100
list_length = 20
def price_formula(previous_price):
return previous_price * 1.2 # you would replace this with your actual calculation
prices = [first_price] # create list with initial item
for i in range(list_length): # repeats exactly 'list_length' times, turn number is 'i'
prices.append(price_formula(prices[-1])) # append new price to list
# prices[-1] always returns the last element of the list, i.e. the previously added one.
print("\n".join(map(str, prices)))
My optimization of your code snippet:
import random
initial_price = 10
return_daily = 0.12 / 252
vol_daily = 0.30 / (math.sqrt(252))
def stock_prices(number_of_prices):
prices = [initial_price]
for i in range(0, number_of_prices):
prices.append(prices[-1] + (prices[-1] * (return_daily+(vol_daily*random.gauss(0,1))))
return prices
This is the classic Markov process. The present value depends upon its previous value, and only its previous value. The best thing to use in this case is what is called an iterator. Iterators can be created to generate arbitrary iterators that model the markov model.
Learn about how iterators can be generated here http://anandology.com/python-practice-book/iterators.html
Now that you have some understanding of how iterators work, you can create your own iterators for your problem. You need a class that implements the __iter__() method and the next() method.
Something like this:
import random
from math import sqrt
class Abc:
def __init__(self, initPrice):
self.v = initPrice # This is the initial price
self.dailyRet = 0.12/252
self.dailyVol = 0.3/sqrt(252)
return
def __iter__(self): return self
def next(self):
self.v += self.v * (self.dailyRet + self.dailyVol*random.gauss(0,1) )
return self.v
if __name__ == '__main__':
initPrice = 10
temp = Abc(initPrice)
for i in range(10):
print temp.next()
This will give the output:
> python test.py
10.3035353791
10.3321905359
10.3963790497
10.5354048937
10.6345509793
10.2598381299
10.3336476153
10.6495914319
10.7915999185
10.6669136891
Note that this does not have the stop iteration command, so if you use this incorrectly, you may get into trouble. However, that is not difficult to implement and I hope you try to implement it ...