refactor large if / elif/ else clauses - if-statement

I'm working on a project which involves "building" a SQL.
As part of the build process - I often get to a point where one building method is taking into account a lot of boolean conditions (or flags) as part of its calculations.
This usually causes the method to look like a big if/elif/else tree (consider this obviously bad piece of code):
def build(a:bool, b:bool, c:bool, d:bool, e:bool):
output = ...
if a:
if b:
if c:
pass # mutate output
else:
pass # mutate output
else:
if c:
if d:
pass # mutate output
else:
pass # mutate output
else:
if d:
pass # mutate output
else:
pass # mutate output
elif e or b:
pass
else:
pass
return output
My question to you is how would you refactor such code?
I'm looking for some kind design pattern or similar method which will make code:
more readable
support a case where now "build" should take into account a new flag (as long as the project progress we found new cases / features we would like to add to calculations).
will be easy to test
My project is written in python 3.8 - but I believe it's a general problem that may be relevant to any other programming language.

Related

Assign nested function to variable with parameter

disclaimer: My title may not be accurate as far as what I would like to accomplish, but I can update if someone can correct my terminology
I have 2 functions, each with a separate purpose and usable on its own, but occasionally I would like to combine the two to perform both actions at once and return a single result, and to do this I would like to assign to a variable name
I know I can create a 3rd function that does basically what I want as it is really simple.. though it's become a bit of a challenge to myself to find a way of doing this
def str2bool(string):
return string.lower() in ("yes", "true", "t", "1")
def get_setting(string):
if string == 'cat':
return 'yes'
else:
return 'no'
VALID_BOOL = str2bool(get_setting)
print VALID_BOOL('cat')
So basically I would like to assign the combination of the 2 functions to a variable that I can call and pass in the string parameter to evaluate
In my real world code, get_setting() would retrieve a user setting and return the value, I would then like to test that value and return it as a boolean
Again I know I can just create a 3rd function that would get the value and do the quick test.. but this is more for learning to see if it can be done as I'm trying to do.. and so far my different variations of assigning and calling aren't working, is it even possible or would it turn too complex?
Using lambda is easy, but i don't know if it is exactly what you are looking for.
Example:
f = lambda astring : str2bool(get_setting(astring))
Outputs:
>>> f('cat')
True

Parse CSV efficiently in python

I am writing a CSV parser which has following structure
class decode:
def __init__(self):
self.fd = open('test.csv')
def decodeoperation(self):
for row in self.fd:
getcmd = self.decodecmd(row)
if cmd == 'A'
self.decodeAopt()
elif cmd == 'B':
self.decodeBopt()
def decodeAopt(self):
for row in self.fd:
#decodefurther dependencies based on cmd A till
#a condition occurs on any further row
return
def decodeBopt(self):
for row in self.fd:
#decodefurther dependencies based on cmd B till
#a condition occurs on any further row
return
The current code is working fine for me but I am not feeling good to iterate through the CSV file in all the methods. Could it be done in a better way?
There is nothing inherently wrong with using a common iterator across multiple methods, as long as you can determine in advance which method to dispatch to at any given point in the sequence (which you are doing by decoding the cmd from the row and getting 'A', 'B', etc.). The design has issues if you have to read several items before you could determine which method to call, and might have to back up if you picked the wrong method and needed to try another. In parsing, this is called backtracking. Since you are passing around a file object, backing up is difficult. Note that your separate decoder methods will have to know when to stop before reading the next row that contains a command, so they will need some sort of terminating sentinel row that they can recognize.
Some general comments on your Python and class design:
You have a nice simple if-elif-elif dispatch table that can translate to a Python dict like this:
# put this code in place of your "if cmd == ... elif elif elif..." code
dispatch = {
# note - no ()'s, we just want to reference the methods, not call them
'A': self.decodeAopt,
'B': self.decodeBopt,
'C': self.decodeCopt,
# look how easy it is to add more decoders
}
# lookup which decoder to use for the current cmd
decoder = dispatch[cmd]
# run it
decoder()
# or do it all in one line
dispatch[cmd]()
Instead of having your __init__ method open a file, let it accept an iterator object. This will make it much easier to write tests for your object, since you'll be able to pass simple Python lists containing CSV rows.
class decode:
def __init__(self, sequence):
self.fd = sequence
You might want to rename this var from 'fd' to something like 'seq', since it doesn't have to be a file, but could be any iterable that gives you decodable rows.
If you are doing your own CSV parsing, look at using the builtin csv module. It will do quite a bit of work for you, like parsing quoted strings that could contain commas, and can give you easy-to-work-with dicts for each row, given headers read from the input file, or specified by you. If you have modified __init__ as I suggested, you can use it like:
import csv
# assuming test.csv has a header row
reader = csv.DictReader(open('test.csv'))
# or specify headers if not - I encourage you to give these columns better names
reader.fieldnames = ['cmd', 'val1', 'val2', 'val3']
decoder = decode(reader)
decoder.decodeoperation()
Then you can write in decodeoperation:
cmd = row['cmd']
Note that this would impart a slightly different design to your class, that it would expect to be given a sequence of dicts, rather than a sequence of strings.

Python - null object pattern with generators

It is apparently Pythonic to return values that can be treated as 'False' versions of the successful return type, such that if MyIterableObject: do_things() is a simple way to deal with the output whether or not it is actually there.
With generators, bool(MyGenerator) is always True even if it would have a len of 0 or something equally empty. So while I could write something like the following:
result = list(get_generator(*my_variables))
if result:
do_stuff(result)
It seems like it defeats the benefit of having a generator in the first place.
Perhaps I'm just missing a language feature or something, but what is the pythonic language construct for explicitly indicating that work is not to be done with empty generators?
To be clear, I'd like to be able to give the user some insight as to how much work the script actually did (if any) - contextual snippet as follows:
# Python 2.7
templates = files_from_folder(path_to_folder)
result = list(get_same_sections(templates)) # returns generator
if not result:
msg("No data to sync.")
sys.exit()
for data in result:
for i, tpl in zip(data, templates):
tpl['sections'][i]['uuid'] = data[-1]
msg("{} sections found to sync up.".format(len(result)))
It works, but I think that ultimately it's a waste to change the generator into a list just to see if there's any work to do, so I assume there's a better way, yes?
EDIT: I get the sense that generators just aren't supposed to be used in this way, but I will add an example to show my reasoning.
There's a semi-popular 'helper function' in Python that you see now and again when you need to traverse a structure like a nested dict or what-have-you. Usually called getnode or getn, whenever I see it, it reads something like this:
def get_node(seq, path):
for p in path:
if p in seq:
seq = seq[p]
else:
return ()
return seq
So in this way, you can make it easier to deal with the results of a complicated path to data in a nested structure without always checking for None or try/except when you're not actually dealing with 'something exceptional'.
mydata = get_node(my_container, ('path', 2, 'some', 'data'))
if mydata: # could also be "for x in mydata", etc
do_work(mydata)
else:
something_else()
It's looking less like this kind of syntax would (or could) exist with generators, without writing a class that handles generators in this way as has been suggested.
A generator does not have a length until you've exhausted its iterations.
the only way to get whether it's got anything or not, is to exhaust it
items = list(myGenerator)
if items:
# do something
Unless you wrote a class with attribute nonzero that internally looks at your iterations list
class MyGenerator(object):
def __init__(self, items):
self.items = items
def __iter__(self):
for i in self.items:
yield i
def __nonzero__(self):
return bool(self.items)
>>> bool(MyGenerator([]))
False
>>> bool(MyGenerator([1]))
True
>>>

Iterating over a large unicode list taking a long time?

I'm working with the program Autodesk Maya.
I've made a naming convention script that will name each item in a certain convention accordingly. However I have it list every time in the scene, then check if the chosen name matches any current name in the scene, and then I have it rename it and recheck once more through the scene if there is a duplicate.
However, when i run the code, it can take as long as 30 seconds to a minute or more to run through it all. At first I had no idea what was making my code run slow, as it worked fine in a relatively low scene amount. But then when i put print statements in the check scene code, i saw that it was taking a long time to check through all the items in the scene, and check for duplicates.
The ls() command provides a unicode list of all the items in the scene. These items can be relatively large, up to a thousand or more if the scene has even a moderate amount of items, a normal scene would be several times larger than the testing scene i have at the moment (which has about 794 items in this list).
Is this supposed to take this long? Is the method i'm using to compare things inefficient? I'm not sure what to do here, the code is taking an excessive amount of time, i'm also wondering if it could be anything else in the code, but this seems like it might be it.
Here is some code below.
class Name(object):
"""A naming convention class that runs passed arguments through user
dictionary, and returns formatted string of users input naming convention.
"""
def __init__(self, user_conv):
self.user_conv = user_conv
# an example of a user convention is '${prefix}_${name}_${side}_${objtype}'
#staticmethod
def abbrev_lib(word):
# a dictionary of abbreviated words is here, takes in a string
# and returns an abbreviated string, if not found return given string
#staticmethod
def check_scene(name):
"""Checks entire scene for same name. If duplicate exists,
Keyword Arguments:
name -- (string) name of object to be checked
"""
scene = ls()
match = [x for x in scene if isinstance(x, collections.Iterable)
and (name in x)]
if not match:
return name
else:
return ''
def convert(self, prefix, name, side, objtype):
"""Converts given information about object into user specified convention.
Keyword Arguments:
prefix -- what is prefixed before the name
name -- name of the object or node
side -- what side the object is on, example 'left' or 'right'
obj_type -- the type of the object, example 'joint' or 'multiplyDivide'
"""
prefix = self.abbrev_lib(prefix)
name = self.abbrev_lib(name)
side = ''.join([self.abbrev_lib(x) for x in side])
objtype = self.abbrev_lib(objtype)
i = 02
checked = ''
subs = {'prefix': prefix, 'name': name, 'side':
side, 'objtype': objtype}
while self.checked == '':
newname = Template (self.user_conv.lower())
newname = newname.safe_substitute(**subs)
newname = newname.strip('_')
newname = newname.replace('__', '_')
checked = self.check_scene(newname)
if checked == '' and i < 100:
subs['objtype'] = '%s%s' %(objtype, i)
i+=1
else:
break
return checked
are you running this many times? You are potentially trolling a list of several hundred or a few thousand items for each iteration inside while self.checked =='', which would be a likely culprit. FWIW prints are also very slow in Maya, especially if you're printing a long list - so doing that many times will definitely be slow no matter what.
I'd try a couple of things to speed this up:
limit your searches to one type at a time - why troll through hundreds of random nodes if you only care about MultiplyDivide right now?
Use a set or a dictionary to search, rather than a list - sets and dictionaries use hashsets and are faster for lookups
If you're worried about maintining a naming convetion, definitely design it to be resistant to Maya's default behavior which is to append numeric suffixes to keep names unique. Any naming convention which doesn't support this will be a pain in the butt for all time, because you can't prevent Maya from doing this in the ordinary course of business. On the other hand if you use that for differntiating instances you don't need to do any uniquification at all - just use rename() on the object and capture the result. The weakness there is that Maya won't rename for global uniqueness, only local - so if you want to make unique node name for things that are not siblings you have to do it yourself.
Here's some cheapie code for finding unique node names:
def get_unique_scene_names (*nodeTypes):
if not nodeTypes:
nodeTypes = ('transform',)
results = {}
for longname in cmds.ls(type = nodeTypes, l=True):
shortname = longname.rpartition("|")[-1]
if not shortname in results:
results[shortname] = set()
results[shortname].add(longname)
return results
def add_unique_name(item, node_dict):
shortname = item.rpartition("|")[-1]
if shortname in node_dict:
node_dict[shortname].add(item)
else:
node_dict[shortname] = set([item])
def remove_unique_name(item, node_dict):
shortname = item.rpartition("|")[-1]
existing = node_dict.get(shortname, [])
if item in existing:
existing.remove(item)
def apply_convention(node, new_name, node_dict):
if not new_name in node_dict:
renamed_item = cmds.ls(cmds.rename(node, new_name), l=True)[0]
remove_unique_name(node, node_dict)
add_unique_name ( renamed_item, node_dict)
return renamed_item
else:
for n in range(99999):
possible_name = new_name + str(n + 1)
if not possible_name in node_dict:
renamed_item = cmds.ls(cmds.rename(node, possible_name), l=True)[0]
add_unique_name(renamed_item, node_dict)
return renamed_item
raise RuntimeError, "Too many duplicate names"
To use it on a particular node type, you just supply the right would-be name when calling apply_convention(). This would rename all the joints in the scene (naively!) to 'jnt_X' while keeping the suffixes unique. You'd do something smarter than that, like your original code did - this just makes sure that leaves are unique:
joint_names= get_unique_scene_names('joint')
existing = cmds.ls( type='joint', l = True)
existing .sort()
existing .reverse()
# do this to make sure it works from leaves backwards!
for item in existing :
apply_convention(item, 'jnt_', joint_names)
# check the uniqueness constraint by looking for how many items share a short name in the dict:
for d in joint_names:
print d, len (joint_names[d])
But, like i said, plan for those damn numeric suffixes, maya makes them all the time without asking for permission so you can't fight em :(
Instead of running ls for each and every name, you should run it once and store that result into a set (an unordered list - slightly faster). Then check against that when you run check_scene
def check_scene(self, name):
"""Checks entire scene for same name. If duplicate exists,
Keyword Arguments:
name -- (string) name of object to be checked
"""
if not hasattr(self, 'scene'):
self.scene = set(ls())
if name not in self.scene:
return name
else:
return ''

How to make this django attribute name search better?

lcount = Open_Layers.objects.all()
form = SearchForm()
if request.method == 'POST':
form = SearchForm(request.POST)
if form.is_valid():
data = form.cleaned_data
val=form.cleaned_data['LayerName']
a=Open_Layers()
data = []
for e in lcount:
if e.Layer_name == val:
data = val
return render_to_response('searchresult.html', {'data':data})
else:
form = SearchForm()
else:
return render_to_response('mapsearch.html', {'form':form})
This just returns back if a particular "name" matches . How do to change it so that it returns when I give a search for "Park" , it should return Park1 , Park2 , Parking , Parkin i.e all the occurences of the park .
You can improve your searching logic by using a list to accumulate the results and the re module to match a larger set of words.
However, this is still pretty limited, error prone and hard to maintain or even harder to make evolve. Plus you'll never get as nice results as if you were using a search engine.
So instead of trying to manually reinvent the wheel, the car and the highway, you should spend some time setting up haystack. This is now the de facto standard to do search in Django.
Use woosh as a backend at first, it's going to be easier. If your search get slow, replace it with solr.
EDIT:
Simple clean alternative:
Open_Layers.objects.filter(name__icontains=val)
This will perform a SQL LIKE, adding %` for you.
This going to kill your database if used too often, but I guess this is probably not going to be an issue with your current project.
BTW, you probably want to rename Open_Layers to OpenLayers as this is the Python PEP8 naming convention.
Instead of
if e.Layer_name == val:
data = val
use
if val in e.Layer_name:
data.append(e.Layer_name)
(and you don't need the line data = form.cleaned_data)
I realise this is an old post, but anyway:
There's a fuzzy logic string comparison already in the python standard library.
import difflib
Mainly have a look at:
difflib.SequenceMatcher(None, a='string1', b='string2', autojunk=True).ratio()
more info here:
http://docs.python.org/library/difflib.html#sequencematcher-objects
What it does it returns a ratio of how close the two strings are, between zero and 1. So instead of testing if they're equal, you chose your similarity ratio.
Things to watch out for, you may want to convert both strings to lower case.
string1.lower()
Also note you may want to impliment your favourite method of splitting the string i.e. .split() or something using re so that a search for 'David' against 'David Brent' ranks higher.