Python: How to create a function that uses its own output and uses an array of random generated numbers - python-2.7

Disclaimer: I am quite new to Python and programming as a whole.
I have been trying to create a function to generated random stock prices using the following:
New stock price = previous price + (previous price*(return + (volatility * random number)))
The return and volatility numbers are fixed. Also, I have generated the random numbers for N times.
The problem is how to create a function that has the output re-used again on itself as an input previous price.
Basically to have an array of NEW stock prices generated from this formula and the previous price variable is the output of the function on itself.
I have been trying to do this for a couple of days and I am sure I am not fully equipped to do it (given that I am a newbie) but ANY HELP would really really be more than appreciated...!!!
Please any help would be useful.
import random
initial_price = 10
return_daily = 0.12 / 252
vol_daily = 0.30 / (math.sqrt(252))
random_numbers = []
for i in range (5):
random_numbers.append(random.gauss(0,1))
def stock_prices(random_numbers):
prices = []
for i in range(0,len(random_numbers)):
calc = initial_price + (initial_price * (return_daily+(vol_daily*random_numbers[i])))
prices.append(calc)
return prices

You can't really use recursion here, because you don't have a break condition that ends the recursion. You could construct one by passing an additional counter parameter that specifies how many more levels to recurse, but that would be not optimal in my opinion.
Instead, I recommend you to use a for loop that gets repeated a fixed number of times you can specify. This way you can add one new price value to a list per loop iteration step and access the previous one to calculate it:
first_price = 100
list_length = 20
def price_formula(previous_price):
return previous_price * 1.2 # you would replace this with your actual calculation
prices = [first_price] # create list with initial item
for i in range(list_length): # repeats exactly 'list_length' times, turn number is 'i'
prices.append(price_formula(prices[-1])) # append new price to list
# prices[-1] always returns the last element of the list, i.e. the previously added one.
print("\n".join(map(str, prices)))
My optimization of your code snippet:
import random
initial_price = 10
return_daily = 0.12 / 252
vol_daily = 0.30 / (math.sqrt(252))
def stock_prices(number_of_prices):
prices = [initial_price]
for i in range(0, number_of_prices):
prices.append(prices[-1] + (prices[-1] * (return_daily+(vol_daily*random.gauss(0,1))))
return prices

This is the classic Markov process. The present value depends upon its previous value, and only its previous value. The best thing to use in this case is what is called an iterator. Iterators can be created to generate arbitrary iterators that model the markov model.
Learn about how iterators can be generated here http://anandology.com/python-practice-book/iterators.html
Now that you have some understanding of how iterators work, you can create your own iterators for your problem. You need a class that implements the __iter__() method and the next() method.
Something like this:
import random
from math import sqrt
class Abc:
def __init__(self, initPrice):
self.v = initPrice # This is the initial price
self.dailyRet = 0.12/252
self.dailyVol = 0.3/sqrt(252)
return
def __iter__(self): return self
def next(self):
self.v += self.v * (self.dailyRet + self.dailyVol*random.gauss(0,1) )
return self.v
if __name__ == '__main__':
initPrice = 10
temp = Abc(initPrice)
for i in range(10):
print temp.next()
This will give the output:
> python test.py
10.3035353791
10.3321905359
10.3963790497
10.5354048937
10.6345509793
10.2598381299
10.3336476153
10.6495914319
10.7915999185
10.6669136891
Note that this does not have the stop iteration command, so if you use this incorrectly, you may get into trouble. However, that is not difficult to implement and I hope you try to implement it ...

Related

What's slowing down this piece of python code?

I have been trying to implement the Stupid Backoff language model (the description is available here, though I believe the details are not relevant to the question).
The thing is, the code's working and producing the result that is expected, but works slower than I expected. I figured out the part that was slowing down everything is here (and NOT in the training part):
def compute_score(self, sentence):
length = len(sentence)
assert length <= self.n
if length == 1:
word = tuple(sentence)
return float(self.ngrams[length][word]) / self.total_words
else:
words = tuple(sentence[::-1])
count = self.ngrams[length][words]
if count == 0:
return self.alpha * self.compute_score(sentence[1:])
else:
return float(count) / self.ngrams[length - 1][words[:-1]]
def score(self, sentence):
""" Takes a list of strings as argument and returns the log-probability of the
sentence using your language model. Use whatever data you computed in train() here.
"""
output = 0.0
length = len(sentence)
for idx in range(length):
if idx < self.n - 1:
current_score = self.compute_score(sentence[:idx+1])
else:
current_score = self.compute_score(sentence[idx-self.n+1:idx+1])
output += math.log(current_score)
return output
self.ngrams is a nested dictionary that has n entries. Each of these entries is a dictionary of form (word_i, word_i-1, word_i-2.... word_i-n) : the count of this combination.
self.alpha is a constant that defines the penalty for going n-1.
self.n is the maximum length of that tuple that the program is looking for in the dictionary self.ngrams. It is set to 3 (though setting it to 2 or even 1 doesn't anything). It's weird because the Unigram and Bigram models work just fine in fractions of a second.
The answer that I am looking for is not a refactored version of my own code, but rather a tip which part of it is the most computationally expensive (so that I could figure out myself how to rewrite it and get the most educational profit from solving this problem).
Please, be patient, I am but a beginner (two months into the world of programming). Thanks.
UPD:
I timed the running time with the same data using time.time():
Unigram = 1.9
Bigram = 3.2
Stupid Backoff (n=2) = 15.3
Stupid Backoff (n=3) = 21.6
(It's on some bigger data than originally because of time.time's bad precision.)
If the sentence is very long, most of the code that's actually running is here:
def score(self, sentence):
for idx in range(len(sentence)): # should use xrange in Python 2!
self.compute_score(sentence[idx-self.n+1:idx+1])
def compute_score(self, sentence):
words = tuple(sentence[::-1])
count = self.ngrams[len(sentence)][words]
if count == 0:
self.compute_score(sentence[1:])
else:
self.ngrams[len(sentence) - 1][words[:-1]]
That's not meant to be working code--it just removes the unimportant parts.
The flow in the critical path is therefore:
For each word in the sentence:
Call compute_score() on that word plus the following 2. This creates a new list of length 3. You could avoid that with itertools.islice().
Construct a 3-tuple with the words reversed. This creates a new tuple. You could avoid that by passing the -1 step argument when making the slice outside this function.
Look up in self.ngrams, a nested dict, with the first key being a number (might be faster if this level were a list; there are only three keys anyway?), and the second being the tuple just created.
Recurse with the first word removed, i.e. make a new tuple (sentence[2], sentence[1]), or
Do another lookup in self.ngrams, implicitly creating another new tuple (words[:-1]).
In summary, I think the biggest problem you have is the repeated and nested creation and destruction of lists and tuples.

how can i improve bulk calculation from file data

I have a file of binary values. The section I am looking at is 4 byte int with the values in the pattern of MW1, MVAR1, MW2, MVAR2,...
I read the values in with
temp = array.array("f")
temp.fromfile(file, length *2)
mw_mvar = temp.tolist()
I then calculate the magnitude like this.
mag = [0] * length
for x in range(0,length * 2, 2):
a = mw_mvar[x]
b = mw_mvar[x + 1]
mag[(x / 2)] = sqrt(a*a + b*b)
The calculations (not the read) are doubling the total length of my script. I know there is (theoretically) a way to do this faster because am mimicking a script that ultimately calls fortran (pyd to call function dlls in fortran i think) which is able to do this calculation with negligible affect on run time.
This is the best i can come up with. any suggestions for improvements?
I have also tried math.pow(), **.5, **2 with no differences.
with no luck improving the calculations, I went around the problem. I realised that I only needed 1% of those calculated values so I created a class to calculate them on demand. It was important (to me) that the resulting code act similar to as if it were a list of calculated values. A lot of the remainder of the process uses the values and different versions of the data are pre-calculated. The class means i don't need a set of procedures for each version of data
class mag:
def __init__(self,mw_mvar):
self._mw_mvar = mw_mvar
#_sgn = sgn
def __len__(self):
return len(self._mw_mvar/2)
def __getitem__(self, item):
return sqrt(self._mw_mvar[2*item] ** 2 + self._mw_mvar[2*item+1] ** 2)
ps this could also be done in a function and take both versions. i would have had to make more changes to the overall script.
function (a,b,x):
if b[x]==0:
return a[x]
else:
return sqrt(a[x]**2 + b[x]**2)

print sum of duplicate numbers and product of non duplicate numbers from the list

I am new to python. I am trying to print sum of all duplicates nos and products of non-duplicates nos from the python list. for examples
list = [2,2,4,4,5,7,8,9,9]. what i want is sum= 2+2+4+4+9+9 and product=5*7*8.
There are pythonic one liners that can do this but here is an explicit way you might find easier to understand.
num_list = [2,2,4,4,5,7,8,9,9]
sum_dup = 0
product = 1
for n in num_list:
if num_list.count(n) == 1:
product *= n
else:
sum_dup += n
Also side note, don't call your list the name "list", it interferes with the builtin name of the list type.
count is useful for this. Sum is built in, but there is no built in "product", so using reduce is the easiest way to do this.
from functools import reduce
import operator
the_sum = sum([x for x in list if list.count(x)>1])
the_product = reduce(operator.mul, [x for x in lst if lst.count(x)==1])
Use a for loop to read a number from the list. create a variable and assign the number to it, read another number and compare them using an if statement. If they are the same sum them like sameNumSum+=sameNumSum else multiply them. Before for loop create these two variables and initialize them. I just gave you the algorithm to it, you can change it into your code. Hope that help though.

Finding length of list without using the 'len' function in python

In my High school assignment part of it is to make a function that will find the average number in a list of floating points. We can't use len and such so something like sum(numList)/float(len(numList)) isn't an option for me. I've spent an hour researching and racking my brain for a way to find the list length without using the len function, and I've got nothing so I was hoping to be either shown how to do it or to be pointed in the right direction. Help me stack overflow, your my only hope. :)
Use a loop to add up the values from the list, and count them at the same time:
def average(numList):
total = 0
count = 0
for num in numList:
total += num
count += 1
return total / count
If you might be passed an empty list, you might want to check for that first and either return a predetermined value (e.g. 0), or raise a more helpful exception than the ZeroDivisionError you'll get if you don't do any checking.
If you're using Python 2 and the list might be all integers, you should either put from __future__ import division at the top of the file, or convert one of total or count to a float before doing the division (initializing one of them to 0.0 would also work).
Might as well show how to do it with a while loop since it's another opportunity to learn.
Normally, you won't need counter variable(s) inside of a for loop. However, there are certain cases where it's helpful to keep a count as well as retrieve the item from the list and this is where enumerate() comes in handy.
Basically, the below solution is what #Blckknght's solution is doing internally.
def average(items):
"""
Takes in a list of numbers and finds the average.
"""
if not items:
return 0
# iter() creates an iterator.
# an iterator has gives you the .next()
# method which will return the next item
# in the sequence of items.
it = iter(items)
count = 0
total = 0
while True:
try:
# when there are no more
# items in the list
total += next(it)
# a stop iteration is raised
except StopIteration:
# this gives us an opportunity
# to break out of the infinite loop
break
# since the StopIteration will be raised
# before a value is returned, we don't want
# to increment the counter until after
# a valid value is retrieved
count += 1
# perform the normal average calculation
return total / float(count)
def length_of_list(my_list):
if not my_list:
return 0
return 1+length_of_list(my_list[1:])

Enforce algebraic relationships between object attributes with SymPy

I'm interested in using SymPy to augment my engineering models. Instead of defining a rigid set of inputs and outputs, I'd like for the user to simply provide everything they know about a system, then apply that data to an algebraic model which calculates the unknowns (if it has enough data).
For an example, say I have some stuff which has some mass, volume, and density. I'd like to define a relationship between those parameters (density = mass / volume) such that when the user has provided enough information (any 2 variables), the 3rd variable is automatically calculated. Finally, if any value is later updated, one of the other values should change to preserve the relationship. One of the challenges with this system is that when there are multiple independent variables, there would need to be a way to specify which independent variable should change to satisfy the requirement.
Here's some working code that I have currently:
from sympy import *
class Stuff(object):
def __init__(self, *args, **kwargs):
#String of variables
varString = 'm v rho'
#Initialize Symbolic variables
m, v, rho = symbols(varString)
#Define density equation
# "rho = m / v" becomes "rho - m / v = 0" which means the eqn. is rho - m / v
# This is because solve() assumes equation = 0
density_eq = rho - m / v
#Store the equation and the variable string for later use
self._eqn = density_eq
self._varString = varString
#Get a list of variable names
variables = varString.split()
#Initialize parameters dictionary
self._params = {name: None for name in variables}
#property
def mass(self):
return self._params['m']
#mass.setter
def mass(self, value):
param = 'm'
self._params[param] = value
self.balance(param)
#property
def volume(self):
return self._params['v']
#volume.setter
def volume(self, value):
param = 'v'
self._params[param] = value
self.balance(param)
#property
def density(self):
return self._params['rho']
#density.setter
def density(self, value):
param = 'rho'
self._params[param] = value
self.balance(param)
def balance(self, param):
#Get the list of all variable names
variables = self._varString.split()
#Get a copy of the list except for the recently changed parameter
others = [name for name in variables if name != param]
#Loop through the less recently changed variables
for name in others:
try:
#Symbolically solve for the current variable
eq = solve(self._eqn, [symbols(name)])[0]
#Get a dictionary of variables and values to substitute for a numerical solution (will contain None for unset values)
indvars = {symbols(n): self._params[n] for n in variables if n is not name}
#Set the parameter with the new numeric solution
self._params[name] = eq.evalf(subs=indvars)
except Exception, e:
pass
if __name__ == "__main__":
#Run some examples - need to turn these into actual tests
stuff = Stuff()
stuff.density = 0.1
stuff.mass = 10.0
print stuff.volume
stuff = Water()
stuff.mass = 10.0
stuff.volume = 100.0
print stuff.density
stuff = Water()
stuff.density = 0.1
stuff.volume = 100.0
print stuff.mass
#increase volume
print "Setting Volume to 200"
stuff.volume = 200.0
print "Mass changes"
print "Mass {0}".format(stuff.mass)
print "Density {0}".format(stuff.density)
#Setting Mass to 15
print "Setting Mass to 15.0"
stuff.mass = 15.0
print "Volume changes"
print "Volume {0}".format(stuff.volume)
print "Density {0}".format(stuff.density)
#Setting Density to 0.5
print "Setting Density to 0.5"
stuff.density = 0.5
print "Mass changes"
print "Mass {0}".format(stuff.mass)
print "Volume {0}".format(stuff.volume)
print "It is impossible to let mass and volume drive density with this setup since either mass or volume will " \
"always be recalculated first."
I tried to be as elegant as I could be with the overall approach and layout of the class, but I can't help but wonder if I'm going about it wrong - if I'm using SymPy in the wrong way to accomplish this task. I'm interested in developing a complex aerospace vehicle model with dozens/hundreds of interrelated properties. I'd like to find an elegant and extensible way to use SymPy to govern property relationships across the vehicle before I scale up from this fairly simple example.
I'm also concerned about how/when to rebalance the equations. I'm familiar with PyQt Signals and Slots which is my first thought for how to link dependent things together to trigger a model update (emit a signal when a value updates, which will be received by rebalancing functions for each system of equations that relies on that parameter?). Yeah, I really don't know the best way to do this with SymPy. Might need a bigger example to explore systems of equations.
Here's some thoughts on where I'm headed with this project. Just using mass as an example, I would like to define the entire vehicle mass as the sum of the subsystem masses, and all subsystem masses as the sum of component masses. In addition, mass relationships will exist between certain subsystems and components. These relationships will drive the model until more concrete data is provided. So if the default ratio of fuel mass to total vehicle mass is 50%, then specifying 100lb of fuel will size the vehicle at 200lb. However if I later specify that the vehicle is actually 210lb, I would want the relationship recalculated (let it become dependent since fuel mass and vehicle mass were set more recently, or because I specified that they are independent variables or locked or something). The next problem is iteration. When circular or conflicting relationships exist in a model, the model must be iterated on to hopefully converge on a solution. This is often the case with the vehicle mass model described above. If the vehicle gets heavier, there needs to be more fuel to meet a requirement, which causes the vehicle to get even heavier, etc. I'm not sure how to leverage SymPy in these situations.
Any suggestions?
PS
Good explanation of the challenges associated with space launch vehicle design.
Edit: Changing the code structure based on goncalopp's suggestions...
class Balanced(object):
def __init__(self, variables, equationStr):
self._variables = variables
self._equation = sympify(equationStr)
#Initialize parameters dictionary
self._params = {name: None for name in self._variables}
def var_getter(varname, self):
return self._params[varname]
def var_setter(varname, self, value):
self._params[varname] = value
self.balance(varname)
for varname in self._variables:
setattr(Balanced, varname, property(fget=partial(var_getter, varname),
fset=partial(var_setter, varname)))
def balance(self, recentlyChanged):
#Get a copy of the list except for the recently changed parameter
others = [name for name in self._variables if name != recentlyChanged]
#Loop through the less recently changed variables
for name in others:
try:
eq = solve(self._equation, [symbols(name)])[0]
indvars = {symbols(n): self._params[n] for n in self._variables if n != name}
self._params[name] = eq.evalf(subs=indvars)
except Exception, e:
pass
class HasMass(Balanced):
def __init__(self):
super(HasMass, self).__init__(variables=['mass', 'volume', 'density'],
equationStr='density - mass / volume')
class Prop(HasMass):
def __init__(self):
super(Prop, self).__init__()
if __name__ == "__main__":
prop = Prop()
prop.density = 0.1
prop.mass = 10.0
print prop.volume
prop = Prop()
prop.mass = 10.0
prop.volume = 100.0
print prop.density
prop = Prop()
prop.density = 0.1
prop.volume = 100.0
print prop.mass
What this makes me want to do is use multiple inheritance or decorators to automatically assign physical inter-related properties to things. So I could have another class called "Cylindrical" which defines radius, diameter, and length properties, then I could have like class DowelRod(HasMass, Cylindrical). The really tricky part here is that I would want to define the cylinder volume (volume = length * pi * radius^2) and let that volume interact with the volume defined in the mass balance equations... such that, perhaps mass would respond to a change in length, etc. Not only is the multiple inheritance tricky, but automatically combining relationships will be even worse. This will get tricky very quickly. I don't know how to handle systems of equations yet, and it's clear with lots of parametric relationships that parameter locking or specifying independent/dependent variables will be necessary.
While I have no experience with this kind of models, and little experience with SymPy, here's some tips:
#property
def mass(self):
return self._params['m']
#mass.setter
def mass(self, value):
param = 'm'
self._params[param] = value
self.balance(param)
#property
def volume(self):
return self._params['v']
#volume.setter
def volume(self, value):
param = 'v'
self._params[param] = value
self.balance(param)
As you've noticed, you're repeating a lot of code for each variable. This is unnecessary, and since you'll have lots of variables, this eventually leads to code maintenance nightmares. You have your variables neatly arranged on varString = 'm v rho' My suggestion is to go further, and define a dictionary:
my_vars= {"m":"mass", "v":"volume", "rho":"density"}
and then add the properties and setters dynamically to the class (instead of explicitly):
from functools import partial
def var_getter(varname, self):
return self._params[varname]
def var_setter(varname, self, value):
self._params[varname] = value
self.balance(varname)
for k,v in my_vars.items():
setattr(Stuff, k, property(fget=partial(var_getter, v), fset=partial(var_setter, v)))
This way, you only need to write the getter and setter once.
If you want to have a couple of different getters, you can still use this technique. Store either the getter for each variable or the variables for each getter - whichever is more convenient.
Another thing that may be useful once equations get complex, is that you can keep equations as strings in your source:
density_eq = sympy.sympify( "rho - m / v" )
Using these two "tricks", you may even want to keep your variables and your equations defined in external text files, or maybe CSV.
Viewing your problem as a constraint-solving problem, you may also want to look at python-constraint (http://labix.org/python-constraint) in addition to SymPy.
Just realized that the python-constraint package doesn't apply here since the domain needs to be finite. If the domain were finite though, here's an illustrative example:
import constraint as cst
p = cst.Problem()
p.addVariables(['rho','m', 'v'], range(100))
p.addConstraint(lambda rho,m,v: rho * v == m, ['rho', 'm', 'v'])
p.addConstraint(lambda rho: rho == 2, ['rho']) # setting inputs; for illustration only
p.addConstraint(lambda m: m == 10, ['m'])
print p.getSolutions()
[{'m': 10, 'rho': 2, 'v': 5}]
However, since the real domain is needed here, the package doesn't apply.