Counting rows per record - list

I'm trying count the total number of invoices where record is concurent.

I am unsure about what it is you exactly want. I'm going to base my expectations on your code.
There are a three errors in the code:
Function arguments should a variable name.
def count_consecutive_invoice (df, invoiceNumber):
retlist = []
for i in range(len(df[InvoiceNumber]) - 1):
Notice the removal of quotes around invoiceNumber. You set, what it's equal to later, when calling the function.
You are trying to call the function instead of accessing a variable:
if count_consecutive_invoice[i] + 1 == count_consecutive_invoice[i
+ 1]:
Should be
if df[InvoiceNumber][i] + 1 == df[InvoiceNumber][i + 1]:
You need to declare all variables, including count.
To do this, just add count = 1 after retlist = [].
This code should work:
df = pd.read_excel(r'MYPath\Book1.xlsx')
def count_consecutive_invoice (df, invoiceNumber):
retlist = []
count = 1
for i in range(len(df[ivoiceNumber]) - 1):
# Check if the next number is consecutive
if df[invoiceNumber][i] + 1 == df[invoiceNumber][i+1]:
count += 1
elif count > 1:
# If it is not and count > 1 append the count and restart counting
retlist.append(count)
count = 1
# Since we stopped the loop one early append the last count
retlist.append(count)
return retlist
output = count_consecutive_invoice(df, 'Invoice Number')
print(output)
output:
[4]

Here is my commented solution.
It does recreate a panda frame, you need to pass the rows name for the id and the one on which we count the invoicing.
def count_consecutive_invoice(table, invoice_row_name, id_row_name):
invoiced_table = {} # the output
for row in table:
if row != invoice_row_name:
invoiced_table[row] = []
invoiced_table['Cont of Consecutive Invoices'] = []
streak = False # keep track of streaking invoices cause on first invoice we need to add 2, not 1
for line in range(len(table[invoice_row_name]) - 1):
id = table[id_row_name][line]
if not id in invoiced_table[id_row_name]:
for row in table:
if row != invoice_row_name:
invoiced_table[row].append(table[row][line])
invoiced_table['Cont of Consecutive Invoices'].append(0)
if id == table[id_row_name][line+1]: #check the vendor id so if you get the invoicing for each
if table[invoice_row_name][line]+1 == table[invoice_row_name][line+1]: # check the actual invoicing
itable_line = invoiced_table[id_row_name].index(id)
invoiced_table['Cont of Consecutive Invoices'][itable_line] += 1 + int(not streak) #otherwise we add 1 or 2 depending on the streak status
streak = True
continue
streak = False
return invoiced_table
invoiced = count_consecutive_invoice(df, "Invoice ID", "Vendor ID")
print(pd.DataFrame.from_dict(invoiced))

Related

for loop list index out of range

I am reading a file, all data in the file is read and put in the list called "data" like below.
[‘43629’, '7’, ‘4’, ‘Runtime Error’, ‘0’, ‘879’, ‘12:20:52’],
[‘43628’, '31’, ‘3’, ‘Runtime Error’, ‘0’, ‘521’, ‘12:20:38’]...
I want to change the time type from str to class Time, but in the for loop the system shows list index out of range. The length of ['12','20','52'] is 3, why time[0], time[1], time[2] not workable?
class Time:
def __init__(self, hour, minute, second):
self.hour = hour
self.minute = minute
self.second = second
data = []
for line in file.readlines():
line = line.strip('\n')
line = line.split(',')
data.append(line)
for i in (1, len(data)+1):
time = data[i][6].split(':')
data[i][6] = Time(time[0], time[1], time[2])
I think you miss two points.
One is missing range and the other is i must be in 0 ~ len(data) - 1.
(1, len(data) + 1) is just a tuple.
for i in range(len(data)):
time = data[i][6].split(':')
data[i][6] = Time(time[0], time[1], time[2])

How do I apply a function to only a certain value in a list?

So I have to make a version update program that updates the values in the version number. So for 11.3.4.5, I want my index function to update a number in that list, and then change all remaining values to 0. So if I want the index to be 0, it would change the first value of the list, so the new list would be 12.0.0.0.If someone could just show me how to set it up, that would be great. This is what I have so far, but I'm so stuck:
def updateVersion(numbers, index):
version = []
index =
for i in numbers:
if any(version):
i + 1
return version
say you provide a list as an arg:
def updateVersion(currVersion, index):
if index == 0:
return [currVersion[0] + 1] + [0] * (len(currVersion) - 1)
else:
return [currVersion[0], index] + [0] * (len(currVersion) - 2)
I'm not sure if I got your question correct. But I would do something like this:
current_version = [11,3,4,5]
def updateVersion(version, index):
i = 0
new_version = [None] * len(version)#get length of the Version - number and create an empty list with the same length
for number in version:
if i == index: # increment the Version Number
new_version[i] = version[i] + 1
elif i > index: # All numbers after the increment are 0
new_version[i] = 0
else:
new_version[i] = version[i]
i = i + 1
return new_version
print str(updateVersion(current_version, 0))#just for testing
So here the output would be:
[12,0,0,0]

Creating a if/else that appends data from mult. scraped pages if counts differ?

I"m trying to scrape Oregon teacher licensure information that looks like this or this(this is publicly available data)
This is my code:
for t in range(0,2): #Refers to txt file with ids
address = 'http://www.tspc.oregon.gov/lookup_application/LDisplay_Individual.asp?id=' + lines2[t]
page = requests.get(address)
tree = html.fromstring(page.text)
count = 0
for license_row in tree.xpath(".//tr[td[1] = 'License Type']/following-sibling::tr[1]"):
license_data = license_row.xpath(".//td/text()")
count = count + 1
if count==1:
ltest1.append(license_data)
if count==2:
ltest2.append(license_data)
if count==3:
ltest3.append(license_data)
with open('teacher_lic.csv', 'wb') as pensionfile:
writer = csv.writer(pensionfile, delimiter="," )
writer.writerow(["Name", "Lic1", "Lic2", "Lic3"])
pen = zip(lname, ltest1, ltest2, ltest3)
for penlist in pen:
writer.writerow(list(penlist))
The problem occurs when this happens: teacher A has 13 licenses and Teacher B has 2. In A my total count = 13 and B = 2. When I get to Teacher B and count equal to 3, I want to say, "if count==3 then ltest3.append(licensure_data) else if count==3 and license_data=='' then license3.append('')" but since there's no count==3 in B there's no way to tell it to append an empty set.
I'd want the output to look like this:
Is there a way to do this? I might be approaching this completely wrong so if someone can point me in another direction, that would be helpful as well.
There's probably a more elegant way to do this but this managed to work pretty well.
I created some blank spaces to fill in when Teacher A has 13 licenses and Teacher B has 2. There were some errors that resulted when the license_row.xpath got to the count==3 in Teacher B. I exploited these errors to create the ltest3.append('').
for t in range(0, 2): #Each txt file contains differing amounts
address = 'http://www.tspc.oregon.gov/lookup_application/LDisplay_Individual.asp?id=' + lines2[t]
page = requests.get(address)
tree = html.fromstring(page.text)
count = 0
test = tree.xpath(".//tr[td[1] = 'License Type']/following-sibling::tr[1]")
difference = 15 - len(test)
for i in range(0, difference):
test.append('')
for license_row in test:
count = count + 1
try:
license_data = license_row.xpath(".//td/text()")
except NameError:
license_data = ''
if license_data=='' and count==1:
ltest1.append('')
if license_data=='' and count==2:
ltest2.append('')
if license_data=='' and count==3:
ltest3.append('')
except AttributeError:
license_data = ''
if count==1 and True:
print "True"
if count==1:
ltest1.append(license_data)
if count==2 and True:
print "True"
if count==2:
ltest2.append(license_data)
if count==3 and True:
print "True"
if count==3:
ltest3.append(license_data)
del license_data
for endorse_row in tree.xpath(".//tr[td = 'Endorsements']/following-sibling::tr"):
endorse_data = endorse_row.xpath(".//td/text()")
lendorse1.append(endorse_data)

Why won't my csv list replace my blank values with "N"?

I'm attempting to create a function which reads a specific column of a csv file which currently alternates between empty values and "1", pops them into a list and then replaces them with an "N" for the empty value and "B" for the "1"'s. I'm pretty new to python, as well as programming in general, so any tips and all help is welcome. This is what I have so far, and it does process, but only replaces my "1"'s with "B"'s. I've double checked my csv and the position is definitely empty and does not contain spaces. I've also looked at other responses and tried to emulate some similar logic that appeared to be behind them, but something still doesn't seem to work. If someone could point me in the right direction it would be very much appreciated.
#sample data (for 195 entries):
["Header0,"Header1","Foundation","Header3"],
["abc1","a12n","","123"],
["def2","d13b","1","456"],
["ghi3","g12n","","789"],
def Foundation( csv_file_path, Remove_Header = False, Remove_SubHeader = False ):
delineator = ','
raw_file = file(csv_file_path, 'r')
return_List = []
n = 0
#Process lines in file
for line in raw_file.readlines():
#Check if to include or remove header
if (n == 0 ) and (Remove_Header == True):
n = n + 1
continue
#Check if to include or remove sub header
if (n == 1) and (Remove_SubHeader == True):
n = n + 1
continue
sList2 = line.replace("\n","").strip().split( delineator )
col_2 = str(sList2.pop(2))
for n in col_2:
if n == "1":
col_2 = col_2.replace("1", "B")
elif n == "":
col_2 = col_2.replace("", "N")
print col_2
return_List.append(sList2) #add my secondary list back to my main List? right?
sList2.insert(0, col_2)# insert back to my secondary list where it went
n = n + 1 #add to counter and move down the line
raw_file.close()
#Return the list
return return_List

Pseudo-random ordering in django queryset

Suppose I have a QuerySet that returns 10 objects, 3 of which will be displayed in the following positions:
[ display 1 position ] [ display 2 position ] [ display 3 position ]
The model representing it is as follows:
class FeaturedContent(models.Model):
image = models.URLField()
position = models.PositiveSmallIntegerField(blank=True, null=True)
where position can be either 1, 2, 3, or unspecified (Null).
I want to be able to order the QuerySet randomly EXCEPT FOR the objects with a specified position. However, I can't order it by doing:
featured_content = FeaturedContent.objects.order_by('-position', '?')
Because if I had one item that had position = 2, and all the other items were Null, then the item would appear in position 1 instead of position 2.
How would I do this ordering?
Thinking about this, perhaps it would be best to have the data as a dict instead of a list, something like:
`{'1': item or null, '2': item or null, '3': item or null, '?': [list of other items]}`
If you use a db backend that does random ordering efficiently you could do it like this:
# This will hold the result
featured_dict = {}
featured_pos = FeaturedContent.objects.filter(position__isnull=False).order_by('-position')
featured_rand = FeaturedContent.objects.filter(position__isnull=True).order_by('?')
pos_index = 0
rand_index = 0
for pos in range(1, 4):
content = None
if pos_index < len(featured_pos) and featured_pos[pos_index].position == pos:
content = featured_pos[pos_index]
pos_index += 1
elif rand_index < len(featured_rand):
content = featured_rand[rand_index]
rand_index += 1
featured_dict[str(pos)] = content
# I'm not sure if you have to check for a valid index first before slicing
featured_dict['?'] = featured_rand[rand_index:]
If you just want to iterate over the queryset you can have two querysets, order them and chain them.
import itertools
qs1 = FeaturedContent.objects.filter(position__isnull=False).order_by('-position')
qs2 = FeaturedContent.objects.filter(position__isnull=True).order_by('?')
featured_content = itertools.chain(qs1, qs2)
for item in featured_content:
#do something with qs item
print item
Upadate:
Since you are asking to make sure position determines the order and the "blank" spaces are replaced randomly by elements with null positions. If the featured list you want to obtain is not too large, 20 in this case
featured = []
rands = []
for i in xrange(1, 20):
try:
x = FeaturedContent.objects.get(position=i) # assuming position is unique
except FeaturedContentDoesNotExist:
if not rands:
rands = list(FeaturedContent.objects.filter(position__isnull=True).order_by('?')[:20]
x = rands[0]
rands = rands[1:]
featured.append(x)
I would post process it, doing a merge sort between the ordered and unordered records.
EDIT:
The beginnings of a generator for this:
def posgen(posseq, arbseq, posattr='position', startpos=1):
posel = next(posseq)
for cur in itertools.count(startpos):
if getattr(posel, posattr) == cur:
yield posel
posel = next(posseq)
else:
yield next(arbseq)
Note that there are lots of error conditions possible in this code (hint: StopIteration).