I have a file that I want to unpack and utilise the columns in different files. The issue I have is that the file I want to unpack varies from row to row on the number of columns it has (for example row 1 could have 7 columns, row 2 could have 15).
How do I unpack the file without receiving the error "Too many values to unpack"?
filehandle3 = open ('output_steps.txt', 'r')
filehandle4 = open ('head_cluster.txt', 'w')
for line in iter(filehandle3):
id, category = line.strip('\n').split('\t')
filehandle4.write(id + "\t" + category + "\n")
filehandle3.close()
filehandle4.close()
Any help would be great. Thanks!
You should extract the values separately, if present, e.g. like this:
for line in iter(filehandle3):
values = line.strip('\n').split('\t')
id = values[0] if len(values) > 0 else None
category = values[1] if len(values) > 1 else None
...
You could also create a helper function for this:
def safe_get(values, index, default=None):
return values[index] if len(values) > index else default
or using try/except:
def safe_get(values, index, default=None):
try:
return values[index]
except IndexError:
return default
and use it like this:
category = safe_get(values, 1)
With Python 3, and if the rows always have at least as many elements as you need, you can use
for line in iter(filehandle3):
id, category, *junk = line.strip('\n').split('\t')
This will bind the first element to id, the second to category, and the rest to junk.
Related
I'm trying count the total number of invoices where record is concurent.
I am unsure about what it is you exactly want. I'm going to base my expectations on your code.
There are a three errors in the code:
Function arguments should a variable name.
def count_consecutive_invoice (df, invoiceNumber):
retlist = []
for i in range(len(df[InvoiceNumber]) - 1):
Notice the removal of quotes around invoiceNumber. You set, what it's equal to later, when calling the function.
You are trying to call the function instead of accessing a variable:
if count_consecutive_invoice[i] + 1 == count_consecutive_invoice[i
+ 1]:
Should be
if df[InvoiceNumber][i] + 1 == df[InvoiceNumber][i + 1]:
You need to declare all variables, including count.
To do this, just add count = 1 after retlist = [].
This code should work:
df = pd.read_excel(r'MYPath\Book1.xlsx')
def count_consecutive_invoice (df, invoiceNumber):
retlist = []
count = 1
for i in range(len(df[ivoiceNumber]) - 1):
# Check if the next number is consecutive
if df[invoiceNumber][i] + 1 == df[invoiceNumber][i+1]:
count += 1
elif count > 1:
# If it is not and count > 1 append the count and restart counting
retlist.append(count)
count = 1
# Since we stopped the loop one early append the last count
retlist.append(count)
return retlist
output = count_consecutive_invoice(df, 'Invoice Number')
print(output)
output:
[4]
Here is my commented solution.
It does recreate a panda frame, you need to pass the rows name for the id and the one on which we count the invoicing.
def count_consecutive_invoice(table, invoice_row_name, id_row_name):
invoiced_table = {} # the output
for row in table:
if row != invoice_row_name:
invoiced_table[row] = []
invoiced_table['Cont of Consecutive Invoices'] = []
streak = False # keep track of streaking invoices cause on first invoice we need to add 2, not 1
for line in range(len(table[invoice_row_name]) - 1):
id = table[id_row_name][line]
if not id in invoiced_table[id_row_name]:
for row in table:
if row != invoice_row_name:
invoiced_table[row].append(table[row][line])
invoiced_table['Cont of Consecutive Invoices'].append(0)
if id == table[id_row_name][line+1]: #check the vendor id so if you get the invoicing for each
if table[invoice_row_name][line]+1 == table[invoice_row_name][line+1]: # check the actual invoicing
itable_line = invoiced_table[id_row_name].index(id)
invoiced_table['Cont of Consecutive Invoices'][itable_line] += 1 + int(not streak) #otherwise we add 1 or 2 depending on the streak status
streak = True
continue
streak = False
return invoiced_table
invoiced = count_consecutive_invoice(df, "Invoice ID", "Vendor ID")
print(pd.DataFrame.from_dict(invoiced))
I need to calculate the majority vote for an TARGET_LABEL Column of my CSV file in Python.
I have a data frame with Row ID and assigned TARGET_LABEL. What I need is the count of TARGET_LABEL(majority). How do I do this?
For Example Data is in this form:
**Row ID TARGET_LABEL**
Row2 0
Row6 0
Row7 0
Row10 0
Row12 0
Row15 1
. .
. .
Row99999 1
I have python script which only reads data from CSV. Here It is
import csv
ifile = open('file1.csv', "rb")
reader = csv.reader(ifile)
rownum = 0
for row in reader:
# Save header row.
if rownum == 0:
header = row
else:
colnum = 0
for col in row:
print '%-8s: %s' % (header[colnum], col)
colnum += 1
rownum += 1
ifile.close()
In case TARGET_LABEL** does not have a NaN values, you could use:
counts = df['TARGET_LABEL'].value_counts()
max_counts = counts.max()
Otherwise if it could contain NaN values, use
df = df.dropna(subset=['TARGET_LABEL'])
removes all the NaN values
df['TARGET_LABEL'].value_counts().max()
should give you the max counts,
df['TARGET_LABEL'].value_counts().idxmax()
should give you the most frequent value.
The package collection contains the class Counter which works similar to a dict (or more precisely a defaultdict(lambda: 0)) and which can be used to find the most frequent item.
I cant get python to display the number of columns in the array. the rows show up just fine though.
def getDataArray1D(filename):
fileHandle = open(filename, 'r')
fileData=map(float, fileHandle) # !!
fileHandle.close()
return fileData
data = getDataArray1D("HEIGHT.csv")
#print data
rows = len(data)
columns =len(data[0])
print rows, columns
I'm not 100% sure what your question is, but to get the length of the float, you can do
def returnLength(number):
return len(str(number))
I'm trying to implement a function to find occurrences in a list, here's my code:
def all_numbers():
num_list = []
c.execute("SELECT * FROM myTable")
for row in c:
num_list.append(row[1])
return num_list
def compare_results():
look_up_num = raw_input("Lucky number: ")
occurrences = [i for i, x in enumerate(all_numbers()) if x == look_up_num]
return occurrences
I keep getting an empty list instead of the ocurrences even when I enter a number that is on the mentioned list.
Your code does the following:
It fetches everything from the database. Each row is a sequence.
Then, it takes all these results and adds them to a list.
It returns this list.
Next, your code goes through each item list (remember, its a sequence, like a tuple) and fetches the item and its index (this is what enumerate does).
Next, you attempt to compare the sequence with a string, and if it matches, return it as part of a list.
At #5, the script fails because you are comparing a tuple to a string. Here is a simplified example of what you are doing:
>>> def all_numbers():
... return [(1,5), (2,6)]
...
>>> lucky_number = 5
>>> for i, x in enumerate(all_numbers()):
... print('{} {}'.format(i, x))
... if x == lucky_number:
... print 'Found it!'
...
0 (1, 5)
1 (2, 6)
As you can see, at each loop, your x is the tuple, and it will never equal 5; even though actually the row exists.
You can have the database do your dirty work for you, by returning only the number of rows that match your lucky number:
def get_number_count(lucky_number):
""" Returns the number of times the lucky_number
appears in the database """
c.execute('SELECT COUNT(*) FROM myTable WHERE number_column = %s', (lucky_number,))
result = c.fetchone()
return result[0]
def get_input_number():
""" Get the number to be searched in the database """
lookup_num = raw_input('Lucky number: ')
return get_number_count(lookup_num)
raw_input is returning a string. Try converting it to a number.
occurrences = [i for i, x in enumerate(all_numbers()) if x == int(look_up_num)]
I've been trying to add the number of 2 list inside a dictionnary. The thing is, I need to verify if the value in the selected row and column is already in the dictionnary, if so I want to add the double entry list to the value (another double entry list) already existing in the dictionnary. I'm using a excel spreadsheet + xlrd so i can read it up. I' pretty new to this.
For exemple, the code is checking the account (a number) in the specified row and columns, let's say the value is 10, then if it's not in the dictionnary, it add the 2 values corresponding to this count, let's say [100, 0] as a value to this key. This is working as intended.
Now, the hard part is when the account number is already in the dictionnary. Let's say its the second entry for the account number 10. and it's [50, 20]. I want the value associated to the key "10" to be [150, 20].
I've tried the zip method but it seems to return radomn result, Sometimes it adds up, sometime it doesn't.
import xlrd
book = xlrd.open_workbook("Entry.xls")
print ("The number of worksheets is", book.nsheets)
print ("Worksheet name(s):", book.sheet_names())
sh = book.sheet_by_index(0)
print (sh.name,"Number of rows", sh.nrows,"Number of cols", sh.ncols)
liste_compte = {}
for rx in range(4, 10):
if (sh.cell_value(rowx=rx, colx=4)) not in liste_compte:
liste_compte[((sh.cell_value(rowx=rx, colx=4)))] = [sh.cell_value(rowx=rx, colx=6), sh.cell_value(rowx=rx, colx=7)]
elif (sh.cell_value(rowx=rx, colx=4)) in liste_compte:
three = [x + y for x, y in zip(liste_compte[sh.cell_value(rowx=rx, colx=4)],[sh.cell_value(rowx=rx, colx=6), sh.cell_value(rowx=rx, colx=7)])]
liste_compte[(sh.cell_value(rowx=rx, colx=4))] = three
print (liste_compte)
I'm not going to directly untangle your code, but just help you with a general example that does what you want:
def update_balance(existing_balance, new_balance):
for column in range(len(existing_balance)):
existing_balance[column] += new_balance[column]
def update_account(accounts, account_number, new_balance):
if account_number in accounts:
update_balance(existing_balance = accounts[account_number], new_balance = new_balance)
else:
accounts[account_number] = new_balance
And finally you'd do something like (assuming your xls looks like [account_number, balance 1, balance 2]:
accounts = dict()
for row in xls:
update_account(accounts = accounts,
account_number = row[0],
new_balance = row[1:2])