Iterating over a large unicode list taking a long time? - list

I'm working with the program Autodesk Maya.
I've made a naming convention script that will name each item in a certain convention accordingly. However I have it list every time in the scene, then check if the chosen name matches any current name in the scene, and then I have it rename it and recheck once more through the scene if there is a duplicate.
However, when i run the code, it can take as long as 30 seconds to a minute or more to run through it all. At first I had no idea what was making my code run slow, as it worked fine in a relatively low scene amount. But then when i put print statements in the check scene code, i saw that it was taking a long time to check through all the items in the scene, and check for duplicates.
The ls() command provides a unicode list of all the items in the scene. These items can be relatively large, up to a thousand or more if the scene has even a moderate amount of items, a normal scene would be several times larger than the testing scene i have at the moment (which has about 794 items in this list).
Is this supposed to take this long? Is the method i'm using to compare things inefficient? I'm not sure what to do here, the code is taking an excessive amount of time, i'm also wondering if it could be anything else in the code, but this seems like it might be it.
Here is some code below.
class Name(object):
"""A naming convention class that runs passed arguments through user
dictionary, and returns formatted string of users input naming convention.
"""
def __init__(self, user_conv):
self.user_conv = user_conv
# an example of a user convention is '${prefix}_${name}_${side}_${objtype}'
#staticmethod
def abbrev_lib(word):
# a dictionary of abbreviated words is here, takes in a string
# and returns an abbreviated string, if not found return given string
#staticmethod
def check_scene(name):
"""Checks entire scene for same name. If duplicate exists,
Keyword Arguments:
name -- (string) name of object to be checked
"""
scene = ls()
match = [x for x in scene if isinstance(x, collections.Iterable)
and (name in x)]
if not match:
return name
else:
return ''
def convert(self, prefix, name, side, objtype):
"""Converts given information about object into user specified convention.
Keyword Arguments:
prefix -- what is prefixed before the name
name -- name of the object or node
side -- what side the object is on, example 'left' or 'right'
obj_type -- the type of the object, example 'joint' or 'multiplyDivide'
"""
prefix = self.abbrev_lib(prefix)
name = self.abbrev_lib(name)
side = ''.join([self.abbrev_lib(x) for x in side])
objtype = self.abbrev_lib(objtype)
i = 02
checked = ''
subs = {'prefix': prefix, 'name': name, 'side':
side, 'objtype': objtype}
while self.checked == '':
newname = Template (self.user_conv.lower())
newname = newname.safe_substitute(**subs)
newname = newname.strip('_')
newname = newname.replace('__', '_')
checked = self.check_scene(newname)
if checked == '' and i < 100:
subs['objtype'] = '%s%s' %(objtype, i)
i+=1
else:
break
return checked

are you running this many times? You are potentially trolling a list of several hundred or a few thousand items for each iteration inside while self.checked =='', which would be a likely culprit. FWIW prints are also very slow in Maya, especially if you're printing a long list - so doing that many times will definitely be slow no matter what.
I'd try a couple of things to speed this up:
limit your searches to one type at a time - why troll through hundreds of random nodes if you only care about MultiplyDivide right now?
Use a set or a dictionary to search, rather than a list - sets and dictionaries use hashsets and are faster for lookups
If you're worried about maintining a naming convetion, definitely design it to be resistant to Maya's default behavior which is to append numeric suffixes to keep names unique. Any naming convention which doesn't support this will be a pain in the butt for all time, because you can't prevent Maya from doing this in the ordinary course of business. On the other hand if you use that for differntiating instances you don't need to do any uniquification at all - just use rename() on the object and capture the result. The weakness there is that Maya won't rename for global uniqueness, only local - so if you want to make unique node name for things that are not siblings you have to do it yourself.
Here's some cheapie code for finding unique node names:
def get_unique_scene_names (*nodeTypes):
if not nodeTypes:
nodeTypes = ('transform',)
results = {}
for longname in cmds.ls(type = nodeTypes, l=True):
shortname = longname.rpartition("|")[-1]
if not shortname in results:
results[shortname] = set()
results[shortname].add(longname)
return results
def add_unique_name(item, node_dict):
shortname = item.rpartition("|")[-1]
if shortname in node_dict:
node_dict[shortname].add(item)
else:
node_dict[shortname] = set([item])
def remove_unique_name(item, node_dict):
shortname = item.rpartition("|")[-1]
existing = node_dict.get(shortname, [])
if item in existing:
existing.remove(item)
def apply_convention(node, new_name, node_dict):
if not new_name in node_dict:
renamed_item = cmds.ls(cmds.rename(node, new_name), l=True)[0]
remove_unique_name(node, node_dict)
add_unique_name ( renamed_item, node_dict)
return renamed_item
else:
for n in range(99999):
possible_name = new_name + str(n + 1)
if not possible_name in node_dict:
renamed_item = cmds.ls(cmds.rename(node, possible_name), l=True)[0]
add_unique_name(renamed_item, node_dict)
return renamed_item
raise RuntimeError, "Too many duplicate names"
To use it on a particular node type, you just supply the right would-be name when calling apply_convention(). This would rename all the joints in the scene (naively!) to 'jnt_X' while keeping the suffixes unique. You'd do something smarter than that, like your original code did - this just makes sure that leaves are unique:
joint_names= get_unique_scene_names('joint')
existing = cmds.ls( type='joint', l = True)
existing .sort()
existing .reverse()
# do this to make sure it works from leaves backwards!
for item in existing :
apply_convention(item, 'jnt_', joint_names)
# check the uniqueness constraint by looking for how many items share a short name in the dict:
for d in joint_names:
print d, len (joint_names[d])
But, like i said, plan for those damn numeric suffixes, maya makes them all the time without asking for permission so you can't fight em :(

Instead of running ls for each and every name, you should run it once and store that result into a set (an unordered list - slightly faster). Then check against that when you run check_scene
def check_scene(self, name):
"""Checks entire scene for same name. If duplicate exists,
Keyword Arguments:
name -- (string) name of object to be checked
"""
if not hasattr(self, 'scene'):
self.scene = set(ls())
if name not in self.scene:
return name
else:
return ''

Related

What's slowing down this piece of python code?

I have been trying to implement the Stupid Backoff language model (the description is available here, though I believe the details are not relevant to the question).
The thing is, the code's working and producing the result that is expected, but works slower than I expected. I figured out the part that was slowing down everything is here (and NOT in the training part):
def compute_score(self, sentence):
length = len(sentence)
assert length <= self.n
if length == 1:
word = tuple(sentence)
return float(self.ngrams[length][word]) / self.total_words
else:
words = tuple(sentence[::-1])
count = self.ngrams[length][words]
if count == 0:
return self.alpha * self.compute_score(sentence[1:])
else:
return float(count) / self.ngrams[length - 1][words[:-1]]
def score(self, sentence):
""" Takes a list of strings as argument and returns the log-probability of the
sentence using your language model. Use whatever data you computed in train() here.
"""
output = 0.0
length = len(sentence)
for idx in range(length):
if idx < self.n - 1:
current_score = self.compute_score(sentence[:idx+1])
else:
current_score = self.compute_score(sentence[idx-self.n+1:idx+1])
output += math.log(current_score)
return output
self.ngrams is a nested dictionary that has n entries. Each of these entries is a dictionary of form (word_i, word_i-1, word_i-2.... word_i-n) : the count of this combination.
self.alpha is a constant that defines the penalty for going n-1.
self.n is the maximum length of that tuple that the program is looking for in the dictionary self.ngrams. It is set to 3 (though setting it to 2 or even 1 doesn't anything). It's weird because the Unigram and Bigram models work just fine in fractions of a second.
The answer that I am looking for is not a refactored version of my own code, but rather a tip which part of it is the most computationally expensive (so that I could figure out myself how to rewrite it and get the most educational profit from solving this problem).
Please, be patient, I am but a beginner (two months into the world of programming). Thanks.
UPD:
I timed the running time with the same data using time.time():
Unigram = 1.9
Bigram = 3.2
Stupid Backoff (n=2) = 15.3
Stupid Backoff (n=3) = 21.6
(It's on some bigger data than originally because of time.time's bad precision.)
If the sentence is very long, most of the code that's actually running is here:
def score(self, sentence):
for idx in range(len(sentence)): # should use xrange in Python 2!
self.compute_score(sentence[idx-self.n+1:idx+1])
def compute_score(self, sentence):
words = tuple(sentence[::-1])
count = self.ngrams[len(sentence)][words]
if count == 0:
self.compute_score(sentence[1:])
else:
self.ngrams[len(sentence) - 1][words[:-1]]
That's not meant to be working code--it just removes the unimportant parts.
The flow in the critical path is therefore:
For each word in the sentence:
Call compute_score() on that word plus the following 2. This creates a new list of length 3. You could avoid that with itertools.islice().
Construct a 3-tuple with the words reversed. This creates a new tuple. You could avoid that by passing the -1 step argument when making the slice outside this function.
Look up in self.ngrams, a nested dict, with the first key being a number (might be faster if this level were a list; there are only three keys anyway?), and the second being the tuple just created.
Recurse with the first word removed, i.e. make a new tuple (sentence[2], sentence[1]), or
Do another lookup in self.ngrams, implicitly creating another new tuple (words[:-1]).
In summary, I think the biggest problem you have is the repeated and nested creation and destruction of lists and tuples.

How to add an unlimited number of class instances with user input in python

I am trying to use
class reader
def __init__(self, name, booksread)
self.name = name
self.booksread = booksread
while True
option = input("Choose an option: ")
if option = 1:
#What to put here?
I want to create an unlimited number of instances of the reader class, But I could only figure out how to do it a limited number of times by using variables for the class. I also need to call the info later (without losing it). Is it possible to do this with a class? Or would I be better off with a list, or dictionary?
First: if option == 1: is always false in python 3, input only reads strings there.
Second: python lists can be expanded until you run out of RAM.
So the solution would be to create a list in the surrounding code and call append on that every time you have a new item:
mylist = []
while True:
mylist.append(1)
It's perfectly possibly to populate a data structure (such as a list or dict) with instances of a class, given your code example you could put the instances into a list:
class reader
def __init__(self, name, booksread)
self.name = name
self.booksread = booksread
list = []
while True:
option = input("Choose an option: ")
if option == 1:
list.append(reader(name,booksread))
Note: I don't know how you are obtaining the values for 'name' or 'booksread', so their values in the list.append() line are just placeholders
To access the instances in that list, you can then iterate over it, or access elements by their indexes, e.g.
# access each element of the list and print the name
for reader in list:
print(reader.name)
#print the name of the first element of the list
print(list[0].name)

list of lists of dictionaries?

I need to create a structure, in my mind similar to an array of linked lists (where a python list = array and dictionary = linked list). I have a list called blocks, and this is something like what I am looking to make:
blocks[0] = {dictionary},{dictionary},{dictionary},...
blocks[1] = {dictionary},{dictionary},{dictionary},...
etc..
currently I build the blocks as such:
blocks = []
blocks.append[()]
blocks.append[()]
blocks.append[()]
blocks.append[()]
I know that must look ridiculous. I just cannot see in my head what that just made, which is part of my problem. I assign to a block from a different list of dictionary items. Here is a brief overview of how a single block is created...
hold = {}
hold['file']=file
hold['count']=count
hold['mass']=mass_lbs
mg1.append(hold)
##this append can happen several times to mg1
blocks[i].append(mg1[j])
##where i is an index for the block I want to append to, and j is the list index corresponding to whichever dictionary item of mg1 I want to grab.
The reason I want these four main indices in blocks is so that I have shorter code with just the one list instead of block1 block2 block3 block4, which would just make the code way longer than it is now.
Okay, going off of what was discussed in the comments, you're looking for a simple way to create a structure that is a list of four items where each item is a list of dictionaries, and all the dictionaries in one of those lists have the same keys but not necessarily the same values. However, if you know exactly what keys each dictionary will have and that never changes, then it might be worth it to consider making them classes that wrap dictionaries and have each of the four lists be a list of objects. This would be easier to keep in your head, and a bit more Pythonic in my opinion. You also gain the advantage of ensuring that the keys in the dictionary are static, plus you can define helper methods. And by emulating the methods of a container type, you can still use dictionary syntax.
class BlockA:
def __init__(self):
self.dictionary = {'file':None, 'count':None, 'mass':None }
def __len__(self):
return len(self.dictionary)
def __getitem__(self, key):
return self.dictionary[key]
def __setitem__(self, key, value):
if key in self.dictionary:
self.dictionary[key] = value
else:
raise KeyError
def __repr__(self):
return str(self.dictionary)
block1 = BlockA()
block1['file'] = "test"
block2 = BlockA()
block2['file'] = "other test"
Now, you've got a guarantee that all instances of your first block object will have the same keys and no additional keys. You can make similar classes for your other blocks, or some general class, or some mix of the two using inheritance. Now to make your data structure:
blocks = [ [block1, block2], [], [], [] ]
print(blocks) # Or "print blocks" if you're not using Python 3.x
blocks[0][0]['file'] = "some new file"
print(blocks)
It might also be worthwhile to have a class for this blocks container, with specific methods for adding blocks of each type and accessing blocks of each type. That way you wouldn't trip yourself up with accidentally adding the wrong kind of block to one of the four lists or similar issues. But depending on how much you'll be using this structure, that could be overkill.

Enforce algebraic relationships between object attributes with SymPy

I'm interested in using SymPy to augment my engineering models. Instead of defining a rigid set of inputs and outputs, I'd like for the user to simply provide everything they know about a system, then apply that data to an algebraic model which calculates the unknowns (if it has enough data).
For an example, say I have some stuff which has some mass, volume, and density. I'd like to define a relationship between those parameters (density = mass / volume) such that when the user has provided enough information (any 2 variables), the 3rd variable is automatically calculated. Finally, if any value is later updated, one of the other values should change to preserve the relationship. One of the challenges with this system is that when there are multiple independent variables, there would need to be a way to specify which independent variable should change to satisfy the requirement.
Here's some working code that I have currently:
from sympy import *
class Stuff(object):
def __init__(self, *args, **kwargs):
#String of variables
varString = 'm v rho'
#Initialize Symbolic variables
m, v, rho = symbols(varString)
#Define density equation
# "rho = m / v" becomes "rho - m / v = 0" which means the eqn. is rho - m / v
# This is because solve() assumes equation = 0
density_eq = rho - m / v
#Store the equation and the variable string for later use
self._eqn = density_eq
self._varString = varString
#Get a list of variable names
variables = varString.split()
#Initialize parameters dictionary
self._params = {name: None for name in variables}
#property
def mass(self):
return self._params['m']
#mass.setter
def mass(self, value):
param = 'm'
self._params[param] = value
self.balance(param)
#property
def volume(self):
return self._params['v']
#volume.setter
def volume(self, value):
param = 'v'
self._params[param] = value
self.balance(param)
#property
def density(self):
return self._params['rho']
#density.setter
def density(self, value):
param = 'rho'
self._params[param] = value
self.balance(param)
def balance(self, param):
#Get the list of all variable names
variables = self._varString.split()
#Get a copy of the list except for the recently changed parameter
others = [name for name in variables if name != param]
#Loop through the less recently changed variables
for name in others:
try:
#Symbolically solve for the current variable
eq = solve(self._eqn, [symbols(name)])[0]
#Get a dictionary of variables and values to substitute for a numerical solution (will contain None for unset values)
indvars = {symbols(n): self._params[n] for n in variables if n is not name}
#Set the parameter with the new numeric solution
self._params[name] = eq.evalf(subs=indvars)
except Exception, e:
pass
if __name__ == "__main__":
#Run some examples - need to turn these into actual tests
stuff = Stuff()
stuff.density = 0.1
stuff.mass = 10.0
print stuff.volume
stuff = Water()
stuff.mass = 10.0
stuff.volume = 100.0
print stuff.density
stuff = Water()
stuff.density = 0.1
stuff.volume = 100.0
print stuff.mass
#increase volume
print "Setting Volume to 200"
stuff.volume = 200.0
print "Mass changes"
print "Mass {0}".format(stuff.mass)
print "Density {0}".format(stuff.density)
#Setting Mass to 15
print "Setting Mass to 15.0"
stuff.mass = 15.0
print "Volume changes"
print "Volume {0}".format(stuff.volume)
print "Density {0}".format(stuff.density)
#Setting Density to 0.5
print "Setting Density to 0.5"
stuff.density = 0.5
print "Mass changes"
print "Mass {0}".format(stuff.mass)
print "Volume {0}".format(stuff.volume)
print "It is impossible to let mass and volume drive density with this setup since either mass or volume will " \
"always be recalculated first."
I tried to be as elegant as I could be with the overall approach and layout of the class, but I can't help but wonder if I'm going about it wrong - if I'm using SymPy in the wrong way to accomplish this task. I'm interested in developing a complex aerospace vehicle model with dozens/hundreds of interrelated properties. I'd like to find an elegant and extensible way to use SymPy to govern property relationships across the vehicle before I scale up from this fairly simple example.
I'm also concerned about how/when to rebalance the equations. I'm familiar with PyQt Signals and Slots which is my first thought for how to link dependent things together to trigger a model update (emit a signal when a value updates, which will be received by rebalancing functions for each system of equations that relies on that parameter?). Yeah, I really don't know the best way to do this with SymPy. Might need a bigger example to explore systems of equations.
Here's some thoughts on where I'm headed with this project. Just using mass as an example, I would like to define the entire vehicle mass as the sum of the subsystem masses, and all subsystem masses as the sum of component masses. In addition, mass relationships will exist between certain subsystems and components. These relationships will drive the model until more concrete data is provided. So if the default ratio of fuel mass to total vehicle mass is 50%, then specifying 100lb of fuel will size the vehicle at 200lb. However if I later specify that the vehicle is actually 210lb, I would want the relationship recalculated (let it become dependent since fuel mass and vehicle mass were set more recently, or because I specified that they are independent variables or locked or something). The next problem is iteration. When circular or conflicting relationships exist in a model, the model must be iterated on to hopefully converge on a solution. This is often the case with the vehicle mass model described above. If the vehicle gets heavier, there needs to be more fuel to meet a requirement, which causes the vehicle to get even heavier, etc. I'm not sure how to leverage SymPy in these situations.
Any suggestions?
PS
Good explanation of the challenges associated with space launch vehicle design.
Edit: Changing the code structure based on goncalopp's suggestions...
class Balanced(object):
def __init__(self, variables, equationStr):
self._variables = variables
self._equation = sympify(equationStr)
#Initialize parameters dictionary
self._params = {name: None for name in self._variables}
def var_getter(varname, self):
return self._params[varname]
def var_setter(varname, self, value):
self._params[varname] = value
self.balance(varname)
for varname in self._variables:
setattr(Balanced, varname, property(fget=partial(var_getter, varname),
fset=partial(var_setter, varname)))
def balance(self, recentlyChanged):
#Get a copy of the list except for the recently changed parameter
others = [name for name in self._variables if name != recentlyChanged]
#Loop through the less recently changed variables
for name in others:
try:
eq = solve(self._equation, [symbols(name)])[0]
indvars = {symbols(n): self._params[n] for n in self._variables if n != name}
self._params[name] = eq.evalf(subs=indvars)
except Exception, e:
pass
class HasMass(Balanced):
def __init__(self):
super(HasMass, self).__init__(variables=['mass', 'volume', 'density'],
equationStr='density - mass / volume')
class Prop(HasMass):
def __init__(self):
super(Prop, self).__init__()
if __name__ == "__main__":
prop = Prop()
prop.density = 0.1
prop.mass = 10.0
print prop.volume
prop = Prop()
prop.mass = 10.0
prop.volume = 100.0
print prop.density
prop = Prop()
prop.density = 0.1
prop.volume = 100.0
print prop.mass
What this makes me want to do is use multiple inheritance or decorators to automatically assign physical inter-related properties to things. So I could have another class called "Cylindrical" which defines radius, diameter, and length properties, then I could have like class DowelRod(HasMass, Cylindrical). The really tricky part here is that I would want to define the cylinder volume (volume = length * pi * radius^2) and let that volume interact with the volume defined in the mass balance equations... such that, perhaps mass would respond to a change in length, etc. Not only is the multiple inheritance tricky, but automatically combining relationships will be even worse. This will get tricky very quickly. I don't know how to handle systems of equations yet, and it's clear with lots of parametric relationships that parameter locking or specifying independent/dependent variables will be necessary.
While I have no experience with this kind of models, and little experience with SymPy, here's some tips:
#property
def mass(self):
return self._params['m']
#mass.setter
def mass(self, value):
param = 'm'
self._params[param] = value
self.balance(param)
#property
def volume(self):
return self._params['v']
#volume.setter
def volume(self, value):
param = 'v'
self._params[param] = value
self.balance(param)
As you've noticed, you're repeating a lot of code for each variable. This is unnecessary, and since you'll have lots of variables, this eventually leads to code maintenance nightmares. You have your variables neatly arranged on varString = 'm v rho' My suggestion is to go further, and define a dictionary:
my_vars= {"m":"mass", "v":"volume", "rho":"density"}
and then add the properties and setters dynamically to the class (instead of explicitly):
from functools import partial
def var_getter(varname, self):
return self._params[varname]
def var_setter(varname, self, value):
self._params[varname] = value
self.balance(varname)
for k,v in my_vars.items():
setattr(Stuff, k, property(fget=partial(var_getter, v), fset=partial(var_setter, v)))
This way, you only need to write the getter and setter once.
If you want to have a couple of different getters, you can still use this technique. Store either the getter for each variable or the variables for each getter - whichever is more convenient.
Another thing that may be useful once equations get complex, is that you can keep equations as strings in your source:
density_eq = sympy.sympify( "rho - m / v" )
Using these two "tricks", you may even want to keep your variables and your equations defined in external text files, or maybe CSV.
Viewing your problem as a constraint-solving problem, you may also want to look at python-constraint (http://labix.org/python-constraint) in addition to SymPy.
Just realized that the python-constraint package doesn't apply here since the domain needs to be finite. If the domain were finite though, here's an illustrative example:
import constraint as cst
p = cst.Problem()
p.addVariables(['rho','m', 'v'], range(100))
p.addConstraint(lambda rho,m,v: rho * v == m, ['rho', 'm', 'v'])
p.addConstraint(lambda rho: rho == 2, ['rho']) # setting inputs; for illustration only
p.addConstraint(lambda m: m == 10, ['m'])
print p.getSolutions()
[{'m': 10, 'rho': 2, 'v': 5}]
However, since the real domain is needed here, the package doesn't apply.

How to make this django attribute name search better?

lcount = Open_Layers.objects.all()
form = SearchForm()
if request.method == 'POST':
form = SearchForm(request.POST)
if form.is_valid():
data = form.cleaned_data
val=form.cleaned_data['LayerName']
a=Open_Layers()
data = []
for e in lcount:
if e.Layer_name == val:
data = val
return render_to_response('searchresult.html', {'data':data})
else:
form = SearchForm()
else:
return render_to_response('mapsearch.html', {'form':form})
This just returns back if a particular "name" matches . How do to change it so that it returns when I give a search for "Park" , it should return Park1 , Park2 , Parking , Parkin i.e all the occurences of the park .
You can improve your searching logic by using a list to accumulate the results and the re module to match a larger set of words.
However, this is still pretty limited, error prone and hard to maintain or even harder to make evolve. Plus you'll never get as nice results as if you were using a search engine.
So instead of trying to manually reinvent the wheel, the car and the highway, you should spend some time setting up haystack. This is now the de facto standard to do search in Django.
Use woosh as a backend at first, it's going to be easier. If your search get slow, replace it with solr.
EDIT:
Simple clean alternative:
Open_Layers.objects.filter(name__icontains=val)
This will perform a SQL LIKE, adding %` for you.
This going to kill your database if used too often, but I guess this is probably not going to be an issue with your current project.
BTW, you probably want to rename Open_Layers to OpenLayers as this is the Python PEP8 naming convention.
Instead of
if e.Layer_name == val:
data = val
use
if val in e.Layer_name:
data.append(e.Layer_name)
(and you don't need the line data = form.cleaned_data)
I realise this is an old post, but anyway:
There's a fuzzy logic string comparison already in the python standard library.
import difflib
Mainly have a look at:
difflib.SequenceMatcher(None, a='string1', b='string2', autojunk=True).ratio()
more info here:
http://docs.python.org/library/difflib.html#sequencematcher-objects
What it does it returns a ratio of how close the two strings are, between zero and 1. So instead of testing if they're equal, you chose your similarity ratio.
Things to watch out for, you may want to convert both strings to lower case.
string1.lower()
Also note you may want to impliment your favourite method of splitting the string i.e. .split() or something using re so that a search for 'David' against 'David Brent' ranks higher.