How to Reduce Code Duplication of If-Else Statements in Python - python-2.7

I'm a student with the very bad habit of duplicating code all over the place, which is something I want to change.
Below, I have a snippet of code from a function I'm writing. Quick explanation: The code would look at an HR website for a person, and return info about the employees he's managing (assuming he manages anyone).
There are two types of employees: regular employees and contract workers. On the website, regular employees underneath the manager would all be listed under employeeList, and the contractors would be listed under contractWorkerList.
response = opener.open('myFakeOrgHierarchy.com/JohnSmith_The_Manager')
allDataFromPage = (response.read())
jsonVersionOfAllData = json.loads(allDataFromPage)
listOfAllReports = []
numOfEmployeeDirectReports = len(jsonVersionOfAllData['employeeList']['list'])
numOfContractWorkerReports = len(jsonVersionOfAllData['contractWorkerList']['list'])
if numOfEmployeeDirectReports != 0:
for i in range(0, numOfEmployeeDirectReports, 1):
workerInfo = {}
workerInfo['empLname'] = jsonVersionOfAllData['employeeList']['list'][i]['lastName']
workerInfo['empFname'] = jsonVersionOfAllData['employeeList']['list'][i]['firstName']
listOfAllReports.append(workerInfo)
if numOfContractWorkerReports != 0:
for i in range(0, numOfContractWorkerReports, 1):
workerInfo = {}
workerInfo['empLname'] = jsonVersionOfAllData['contractWorkerList']['list'][i]['lastName']
workerInfo['empFname'] = jsonVersionOfAllData['contractWorkerList']['list'][i]['firstName']
listOfAllReports.append(workerInfo)
As you can see, I have several lines of code that are almost identical to other lines, with only small variations. Is there a way to check both contractWorkerList and employeeList to see if they're not empty, and (assuming they're not empty) go through both contractWorkerList and employeeList and grab values without duplicating the code?
(Since I'm a relative beginner, any simple examples you could provide with your recommendations would be much appreciated)

For starters, every time you see something duplicated, think about creating a variable upfront & use that. After that, you can decide what should be factored out into a function. Below, I just removed most of the duplicated items.
response = opener.open('myFakeOrgHierarchy.com/JohnSmith_The_Manager')
allDataFromPage = (response.read())
jsonVersionOfAllData = json.loads(allDataFromPage)
listOfAllReports = []
for listType in ('employeeList', 'contractWorkerList'):
json_ver = jsonVersionOfAllData[listType]['list']
directReports = len(json_ver)
if directReports != 0:
for i in range(0, directReports, 1):
workerInfo = {}
for wi_name, json_name in (('empLname', 'lastName'), ('empFname', 'firstName')):
workerInfo[wi_name] = json_ver[i][json_name]
listOfAllReports.append(workerInfo)

Most common way of avoiding code duplication is to define a function with that code.
def checkIfEmpty(numOfReports, listName):
if numOfReports != 0:
for i in range(0, numOfReports, 1):
workerInfo = {}
workerInfo['empLname'] = jsonVersionOfAllData[listName]['list'][i]['lastName']
workerInfo['empFname'] = jsonVersionOfAllData[listName]['list'][i]['firstName']
listOfAllReports.append(workerInfo)
So You will end up with simple and easy to read code:
checkIfEmpty(numOfEmployeeDirectReports, 'employeeList')
checkIfEmpty(numOfContractWorkerReports, 'contractWorkerList')

In this particular scenario, you could do something like this:
for var, key in [(numOfEmployeeDirectReports, 'employeeList'),
(numOfContractWorkerReports, 'contractWorkerList')]:
if var != 0:
for i in range(0, var, 1):
workerInfo = {}
workerInfo['empLname'] = jsonVersionOfAllData[key]['list'][i]['lastName']
workerInfo['empFname'] = jsonVersionOfAllData[key]['list'][i]['firstName']
listOfAllReports.append(workerInfo)

Related

Who will review my first text coding which resulted in indentation error?

def classify0(inX, dataSet, labels, k):
dataSetSize = dataset.shpe[0]
diffMat = tile(inX, (dataSetSize, 1)) - dataSet
sqDiffMat = diffMat ** 2
sqDistances = sqDiffMat.sum(axis = 1)
distances = sqDistances ** 0.5
sortedDistIndicies = distances.argsort()
classCount={}
for i in range(k):
voteIlabel = labels[sortedDistIndicies[i]]
classCount[voteIlabel] = classCount.get(voteIlabel, 0) + 1
sortedClassCount = sorted(classCount.interitems(),
key=operator.itemgetter(1), reverse=true)
return sortedClassCount[0][0]
above is my first coding using python. I imported it to terminal and it say 'IndentationError: unindent does not match any outer indentation level'. But I don't know how to fix it. please help me.
You code at:
sortedClassCount = sorted(classCount.interitems(),
key=operator.itemgetter(1), reverse=true)
return sortedClassCount[0][0]
seems to jump for no reason. Just remove all the extra spaces like so:
sortedClassCount = sorted(classCount.interitems(),
key=operator.itemgetter(1), reverse=true)
return sortedClassCount[0][0]
In python since there are no queerly brackets {} to signify starts and end of functions the interpreter uses indentation level to know the difference.
This is why your code doesn't work.
try deleting all indentation and putting it back because you might have mixed spaces with tabs or you have accidentally added another space to a place you shouldn't have one.

How to calculate all possibilities of very large string matrixes timely?

OK so let's say I have a situation where I have a bunch of objects in different classifications and I need to know the total possible combinations of these objects so I end up with an input that looks like this
{'raw':[{'AH':['P','C','R','Q','L']},
{'BG':['M','A','S','B','F']},
{'KH':['E','V','G','N','Y']},
{'KH':['E','V','G','N','Y']},
{'NM':['1','2','3','4','5']}]}
Where the keys AH, BG, KH, NM constitute groups, the values are list that hold individual objects and a finished group would constitute one member of each list, in this example KH is listed twice so each finished group would have 2 members of KH in it. I build something that handles this, it looks like this.
class Builder():
def __init__(self, data):
self.raw = data['raw']
node = []
for item in self.raw:
for k in item.keys():
node.append({k:0})
logger.debug('node: %s' % node)
#Parse out groups#
self.groups = []
increment = -2
while True:
try:
assert self.raw[increment].values()[0][node[increment][node[increment].keys()[0]]]
increment = -2
for x in self.raw[-1].values()[0]:
group = []
for k in range(0,len(node[:-1])):
position = node[k].keys()[0]
player = self.raw[k].values()[0][node[k][node[k].keys()[0]]]
group.append({position:player})
group.append({self.raw[-1].keys()[0]:x})
if self.repeatRemovals(group):
self.groups.append(group)
node[increment][node[increment].keys()[0]]+=1
except IndexError:
node[increment][node[increment].keys()[0]] = 0
increment-=1
try:
node[increment][node[increment].keys()[0]]+=1
except IndexError:
break
for group in self.groups:
logger.debug(group)
def repeatRemovals(self, group):
for x in range(0, len(group)):
for y in range(0, len(group)):
if group[x].values()[0] == group[y].values()[0] and x != y:
return False
return True
if __name__ == '__main__':
groups = Builder({'raw':[{'AH':['P','C','R','Q','L']},
{'BG':['M','A','S','B','F']},
{'KH':['E','V','G','N','Y']},
{'KH':['E','V','G','N','Y']},
{'NM':['1','2','3','4','5']}]})
logger.debug("Total groups: %d" % len(groups.groups))
The output of running this should clearly state my intended goal, if I have failed to do so in text. My concern is the time it takes to handle large classification of objects, when a classification has some 40 something objects in it, it is in the matrix three times and there are 7 other classifications with comparable object sizes. I think the numpy library could help me, but I am new to scientific programming and am not sure how to go about it, or if it would be worth it, could anyone provide some insight? Thank you.
Try this:
Remove duplicated values
Calculate all possibilities using permutation and factorial
Like that:
https://www.youtube.com/watch?v=Oc50d2GqXx0

Creating a if/else that appends data from mult. scraped pages if counts differ?

I"m trying to scrape Oregon teacher licensure information that looks like this or this(this is publicly available data)
This is my code:
for t in range(0,2): #Refers to txt file with ids
address = 'http://www.tspc.oregon.gov/lookup_application/LDisplay_Individual.asp?id=' + lines2[t]
page = requests.get(address)
tree = html.fromstring(page.text)
count = 0
for license_row in tree.xpath(".//tr[td[1] = 'License Type']/following-sibling::tr[1]"):
license_data = license_row.xpath(".//td/text()")
count = count + 1
if count==1:
ltest1.append(license_data)
if count==2:
ltest2.append(license_data)
if count==3:
ltest3.append(license_data)
with open('teacher_lic.csv', 'wb') as pensionfile:
writer = csv.writer(pensionfile, delimiter="," )
writer.writerow(["Name", "Lic1", "Lic2", "Lic3"])
pen = zip(lname, ltest1, ltest2, ltest3)
for penlist in pen:
writer.writerow(list(penlist))
The problem occurs when this happens: teacher A has 13 licenses and Teacher B has 2. In A my total count = 13 and B = 2. When I get to Teacher B and count equal to 3, I want to say, "if count==3 then ltest3.append(licensure_data) else if count==3 and license_data=='' then license3.append('')" but since there's no count==3 in B there's no way to tell it to append an empty set.
I'd want the output to look like this:
Is there a way to do this? I might be approaching this completely wrong so if someone can point me in another direction, that would be helpful as well.
There's probably a more elegant way to do this but this managed to work pretty well.
I created some blank spaces to fill in when Teacher A has 13 licenses and Teacher B has 2. There were some errors that resulted when the license_row.xpath got to the count==3 in Teacher B. I exploited these errors to create the ltest3.append('').
for t in range(0, 2): #Each txt file contains differing amounts
address = 'http://www.tspc.oregon.gov/lookup_application/LDisplay_Individual.asp?id=' + lines2[t]
page = requests.get(address)
tree = html.fromstring(page.text)
count = 0
test = tree.xpath(".//tr[td[1] = 'License Type']/following-sibling::tr[1]")
difference = 15 - len(test)
for i in range(0, difference):
test.append('')
for license_row in test:
count = count + 1
try:
license_data = license_row.xpath(".//td/text()")
except NameError:
license_data = ''
if license_data=='' and count==1:
ltest1.append('')
if license_data=='' and count==2:
ltest2.append('')
if license_data=='' and count==3:
ltest3.append('')
except AttributeError:
license_data = ''
if count==1 and True:
print "True"
if count==1:
ltest1.append(license_data)
if count==2 and True:
print "True"
if count==2:
ltest2.append(license_data)
if count==3 and True:
print "True"
if count==3:
ltest3.append(license_data)
del license_data
for endorse_row in tree.xpath(".//tr[td = 'Endorsements']/following-sibling::tr"):
endorse_data = endorse_row.xpath(".//td/text()")
lendorse1.append(endorse_data)

How to sort python lists due to certain criteria

I would like to sort a list or an array using python to achive the following:
Say my initial list is:
example_list = ["retg_1_gertg","fsvs_1_vs","vrtv_2_srtv","srtv_2_bzt","wft_3_btb","tvsrt_3_rtbbrz"]
I would like to get all the elements that have 1 behind the first underscore together in one list and the ones that have 2 together in one list and so on. So the result should be:
sorted_list = [["retg_1_gertg","fsvs_1_vs"],["vrtv_2_srtv","srtv_2_bzt"],["wft_3_btb","tvsrt_3_rtbbrz"]]
My code:
import numpy as np
import string
example_list = ["retg_1_gertg","fsvs_1_vs","vrtv_2_srtv","srtv_2_bzt","wft_3_btb","tvsrt_3_rtbbrz"]
def sort_list(imagelist):
# get number of wafers
waferlist = []
for image in imagelist:
wafer_id = string.split(image,"_")[1]
waferlist.append(wafer_id)
waferlist = set(waferlist)
waferlist = list(waferlist)
number_of_wafers = len(waferlist)
# create list
sorted_list = []
for i in range(number_of_wafers):
sorted_list.append([])
for i in range(number_of_wafers):
wafer_id = waferlist[i]
for image in imagelist:
if string.split(image,"_")[1] == wafer_id:
sorted_list[i].append(image)
return sorted_list
sorted_list = sort_list(example_list)
works but it is really awkward and it involves many for loops that slow down everything if the lists are large.
Is there any more elegant way using numpy or anything?
Help is appreciated. Thanks.
I'm not sure how much more elegant this solution is; it is a bit more efficient. You could first sort the list and then go through and filter into final set of sorted lists:
example_list = ["retg_1_gertg","fsvs_1_vs","vrtv_2_srtv","srtv_2_bzt","wft_3_btb","tvsrt_3_rtbbrz"]
sorted_list = sorted(example_list, key=lambda x: x[x.index('_')+1])
result = [[]]
current_num = sorted_list[0][sorted_list[0].index('_')+1]
index = 0
for i in example_list:
if current_num != i[i.index('_')+1]:
current_num = i[i.index('_')+1]
index += 1
result.append([])
result[index].append(i)
print result
If you can make assumptions about the values after the first underscore character, you could clean it up a bit (for example, if you knew that they would always be sequential numbers starting at 1).

I need help making a GUI with python, Glade,and GTK

i have a program that encodes and decodes messages with a key but i want to make it look nicer and more professional. My code is as follows:
from random import seed, shuffle
#Encoder Function
def Encoder(user_input,SEED):
user_input = user_input.lower()
letter = ["a","b","c","d","e","f","g","h","i","j","k",'l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
Letter_code = {"a":0,"b":1,"c":2,"d":3,"e":4,"f":5,"g":6,"h":7,"i":8,"j":9,"k":10,'l':11,'m':12,'n':13,'o':14,'p':15,'q':16,'r':17,'s':18,'t':19,'u':20,'v':21,'w':22,'x':23,'y':24,'z':25}
code = ["a","b","c","d","e","f","g","h","i","j","k",'l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',]
n = []
seed(SEED)
shuffle(code)
for letter in user_input:
for let in letter:
if letter != " ":
if letter == let:
first = Letter_code[let]
n.append(code[first])
else:
n.append("~")
return ''.join(n)
#Decoder Function
def Decoder(user_input,SEED):
user_input = user_input.lower
key_list = ["a","b","c","d","e","f","g","h","i","j","k",'l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
final = ["a","b","c","d","e","f","g","h","i","j","k",'l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
seed(SEED)
shuffle(key_list)
key_code = {}
z = 0
n = []
for key in key_list:
key_code[key] = z
z += 1
for let in user_input:
if let != "~":
for Ke in key_list:
if let == Ke:
a = key_code[Ke]
n.append(final[a])
else:
n.append(" ")
return ''.join(n)
i wanted a gui that would have two entry boxes,one for the message and the other for the key, and i wanted it to have two buttons, one would say encode and the other decode. and also a place in the gui where the final message would be printed and be copy-able by the user. would greatly appreciate it if someone could help me with this
Following glade tutorials may help you.
http://www.overclock.net/t/342279/tutorial-using-python-glade-to-create-a-simple-gui-application
https://wiki.gnome.org/Glade/Tutorials
http://www.pygtk.org/articles/pygtk-glade-gui/Creating_a_GUI_using_PyGTK_and_Glade.htm
As for converting the .py to an exe, you can use py2exe, please take a look at this answer - https://stackoverflow.com/a/14165470/2689986