how to solve concatenate issue with.cell()? row = row work, column = column gives error - python-2.7

I am looping through an excel sheet, looking for a specific name. When found, I print the position of the cell and the value.
I would like to find the position and value of a neighbouring cell, however I can't get .cell() to work by adding 2, indicating I would like the cell 2 columns away in the same row.
row= row works, but column= column gives error, and column + 2 gives error. Maybe this is due to me listing columns as 'ABCDEFGHIJ' earlier in my code? (For full code, see below)
print 'Cell position {} has value {}'.format(cell_name, currentSheet[cell_name].value)
print 'Cell position next door TEST {}'.format(currentSheet.cell(row=row, column=column +2))
Full code:
file = openpyxl.load_workbook('test6.xlsx', read_only = True)
allSheetNames = file.sheetnames
#print("All sheet names {}" .format(file.sheetnames))
for sheet in allSheetNames:
print('Current sheet name is {}'.format(sheet))
currentSheet = file[sheet]
for row in range(1, currentSheet.max_row + 1):
#print row
for column in 'ABCDEFGHIJ':
cell_name = '{}{}'.format(column,row)
if currentSheet[cell_name].value == 'sign_name':
print 'Cell position {} has value {}'.format(cell_name, currentSheet[cell_name].value)
print 'Cell position TEST {}'.format(currentSheet.cell(row=row, column=column +2))
I get this output:
Current sheet name is Sheet1
Current sheet name is Sheet2
Cell position D5 has value sign_name
and:
TypeError: cannot concatenate 'str' and 'int' objects
I get the same error if I try "column = column" as "column = column +2".
Why does row=row work, but column=column dosen't? And how to find the cell name of the cell to the right of my resulting D5 cell?

The reason row=row works and column=column doesn't is because your column value is a string (letter from A to J) while the column argument of a cell is expecting an int (A would be 1, B would be 2, Z would be 26, etc.)
There are a few changes I would make in order to more effectively iterate through the cells and find a neighbor. Firstly, OpenPyXl offers sheet.iter_rows(), which given no arguments, will provide a generator of all rows that are used in the sheet. So you can iterate with
for row in currentSheet.iter_rows():
for cell in row:
because each row is a generator of cells in that row.
Then in this new nested for loop, you can get the current column index with cell.column (D would give 4) and the cell to the right (increment by one column) would be currentSheet.cell(row=row, column=cell.column+1)
Note the difference between the two cell's: currentSheet.cell() is a request for a specific cell while cell.column+1 is the column index of the current cell incremented by 1.
Relevant OpenPyXl documentation:
https://openpyxl.readthedocs.io/en/stable/api/openpyxl.cell.cell.html
https://openpyxl.readthedocs.io/en/stable/api/openpyxl.worksheet.worksheet.html

Related

Change column values based on its position

I'm trying to adjust some columns with negative values in my table, I want to all negative values be changed to 0,
The only problem is that the columns keep changing their names, so I would like to be able to make such adjustment based on column position,
For example, the columns are located in 3 and 4 position,
I have created a conditional column to adjust the negatives volumes,
#"New Column" = Table.AddColumn(#Previous Step", "New Column", each if OldColumnName < 0 then 0 else NewColumn),
Is there a way to make this conditional column based on the OldColumn position, and not by its name?
add column, custom column with formula
= if Record.Field(_,Table.ColumnNames(Source){2})<0 then 0 else Record.Field(_,Table.ColumnNames(Source){2})
or
= if Record.Field(_,Table.ColumnNames(Source){2})<0 then 0 else [some other column])
where {2} is the position in column names
Sample to transform in place to remove negatives
Stepname = Table.TransformColumns(#"PriorStepNameHere",{{Table.ColumnNames(#"PriorStepNameHere"){2}, each if _<0 then 0 else _, Int64.Type}})
for multiple column transformations
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
ColumnsToTransform = {Table.ColumnNames(Source){2},Table.ColumnNames(Source){3}},
#"MultipleTransform" = Table.TransformColumns(Source, List.Transform(ColumnsToTransform,(x)=>{x, each if _<0 then 0 else _, type number}))
in #"MultipleTransform"

storing different rows of two lists in new list

I am trying to create a new object by comparing two list. If the rows are matching the row should be removed form the splitted row_list or appended to a new list containing only the differences between both lists.
Sample data: basically the data is structured in a way that splitted_row_list has all the rows all_rows has, but contains additional rows, which are different, aswell(btw also meaning there is an unequal amount of rows between both lists) . I am amining to do put these additional rows into a new object.
all_rows[0]:'1390', '139080', '13980', '1380', '139080', '13080'
splitted_row_list[0]:'35335','53527','353529','242424','5222','444'
results = []
for row in splitted_row_list:
print(row)
for row1 in all_rows:
if row1 == row:
splitted_row_list.remove(row)
else:
results.append(row)
print(results)
However, this code just returns all the rows. Does anyone have a suggestion?
The two lists are distict, therefore you get every item in one list because if row1 == row is never true, then you wont remove anything.
There are no differences.
EDIT:
You can simply
nonunique = []
for row in splitted_row_list:
print(row)
for row1 in all_rows:
if row1 == row:
nonunique.append(splitted_row_list.remove(row))
result = splitted_row_list #the nonunique have been removed
if you want the non unique from all_rows, just add a all_rows.remove(row).
For the complete sets, just concatenate them after the loop.
all = nonunique +splitted_row_list+all_rows
Thanks I have 'solved' this problem in this particular context by just appending a string and then later sorting based on whether they contain the string or not..not very elegant but it works..
def append_mark2(splitted_row_list):
for row in splitted_row_list:
for row1 in all_rows:
if row1 == row:
row.append('jaja')
print(row)
return splitted_row_list
def sort_on_appendix(splitted_row_list_appended_mark):
next_row_list3=[]
for row in splitted_row_list_appended_mark:
if 'jaja' not in row:
print(row)
next_row_list3.append(row)
print('next_row_list3:',next_row_list3)
return next_row_list3

New to python - trying to chose individual columns from transposed matrix

So presently code is as so:
table = []
for line in open("harrytest.csv") as f:
data = line.split(",")
table.append(data)
transposed = [[table[j][i] for j in range(len(table))] for i in range(len(table[0]))]
openings = transposed[1][1: - 1]
openings = [float(i) for i in openings]
mean = sum(openings)/len(openings)
print mean
minimum = min(openings)
print minimum
maximum = max(openings)
print maximum
range1 = maximum - minimum
print range1
This only prints one column of 7 for me, it also leaves out the bottom line. We are not allowed to import with csv module, use numpy, pandas. The only module allowed is os, sys, math & datetime.
How do I write the code so as to get median, first, last values for any column.
Change this line:
openings = transposed[1][1: - 1]
to this
openings = transposed[1][1:]
and the last row should appear. You calculations for mean, min, max and range seem correct.
For median you have to sort the row and select the one middle element or average of the two middle elements. First and last element is just row[0] and row[-1].

3 column excel find value cell b2 in column a replace with cell c2

using 3 columns 100k down column "a:a"= part desc with part number y400cc(webpage title) cell "b1"=old part number y400cc "c1"=new part number wpy400cc.
*****Find cell b2 in column a and replace with cell c2 ?***
I am not Sure I understood your question fully. But here is my proposal:
Column A contains some text (Example: CELL A2 = "xx456yy")
Column B contains a part number which may or may not be found in A (Example CELL B2 = "456")
Column C contains the new part number (Example: CELL C2 = "900")
Column D to have the following formula, which will replace the Column B text found in Column A, with Column C text:
=IF(IFERROR(FIND(B2,A2),1)<>1,REPLACE(A2,FIND(B2,A2),LEN(B2),C2),"")

Combining data from two dataframe columns into one column

I have time series data in two separate DataFrame columns which refer to the same parameter but are of differing lengths.
On dates where data only exist in one column, I'd like this value to be placed in my new column. On dates where there are entries for both columns, I'd like to have the mean value. (I'd like to join using the index, which is a datetime value)
Could somebody suggest a way that I could combine my two columns? Thanks.
Edit2: I written some code which should merge the data from both of my column, but I get a KeyError when I try to set the new values using my index generated from rows where my first df has values but my second df doesn't. Here's the code:
def merge_func(df):
null_index = df[(df['DOC_mg/L'].isnull() == False) & (df['TOC_mg/L'].isnull() == True)].index
df['TOC_mg/L'][null_index] = df[null_index]['DOC_mg/L']
notnull_index = df[(df['DOC_mg/L'].isnull() == True) & (df['TOC_mg/L'].isnull() == False)].index
df['DOC_mg/L'][notnull_index] = df[notnull_index]['TOC_mg/L']
df.insert(len(df.columns), 'Mean_mg/L', 0.0)
df['Mean_mg/L'] = (df['DOC_mg/L'] + df['TOC_mg/L']) / 2
return df
merge_func(sve)
And here's the error:
KeyError: "['2004-01-14T01:00:00.000000000+0100' '2004-03-04T01:00:00.000000000+0100'\n '2004-03-30T02:00:00.000000000+0200' '2004-04-12T02:00:00.000000000+0200'\n '2004-04-15T02:00:00.000000000+0200' '2004-04-17T02:00:00.000000000+0200'\n '2004-04-19T02:00:00.000000000+0200' '2004-04-20T02:00:00.000000000+0200'\n '2004-04-22T02:00:00.000000000+0200' '2004-04-26T02:00:00.000000000+0200'\n '2004-04-28T02:00:00.000000000+0200' '2004-04-30T02:00:00.000000000+0200'\n '2004-05-05T02:00:00.000000000+0200' '2004-05-07T02:00:00.000000000+0200'\n '2004-05-10T02:00:00.000000000+0200' '2004-05-13T02:00:00.000000000+0200'\n '2004-05-17T02:00:00.000000000+0200' '2004-05-20T02:00:00.000000000+0200'\n '2004-05-24T02:00:00.000000000+0200' '2004-05-28T02:00:00.000000000+0200'\n '2004-06-04T02:00:00.000000000+0200' '2004-06-10T02:00:00.000000000+0200'\n '2004-08-27T02:00:00.000000000+0200' '2004-10-06T02:00:00.000000000+0200'\n '2004-11-02T01:00:00.000000000+0100' '2004-12-08T01:00:00.000000000+0100'\n '2011-02-21T01:00:00.000000000+0100' '2011-03-21T01:00:00.000000000+0100'\n '2011-04-04T02:00:00.000000000+0200' '2011-04-11T02:00:00.000000000+0200'\n '2011-04-14T02:00:00.000000000+0200' '2011-04-18T02:00:00.000000000+0200'\n '2011-04-21T02:00:00.000000000+0200' '2011-04-25T02:00:00.000000000+0200'\n '2011-05-02T02:00:00.000000000+0200' '2011-05-09T02:00:00.000000000+0200'\n '2011-05-23T02:00:00.000000000+0200' '2011-06-07T02:00:00.000000000+0200'\n '2011-06-21T02:00:00.000000000+0200' '2011-07-04T02:00:00.000000000+0200'\n '2011-07-18T02:00:00.000000000+0200' '2011-08-31T02:00:00.000000000+0200'\n '2011-09-13T02:00:00.000000000+0200' '2011-09-28T02:00:00.000000000+0200'\n '2011-10-10T02:00:00.000000000+0200' '2011-10-25T02:00:00.000000000+0200'\n '2011-11-08T01:00:00.000000000+0100' '2011-11-28T01:00:00.000000000+0100'\n '2011-12-20T01:00:00.000000000+0100' '2012-01-19T01:00:00.000000000+0100'\n '2012-02-14T01:00:00.000000000+0100' '2012-03-13T01:00:00.000000000+0100'\n '2012-03-27T02:00:00.000000000+0200' '2012-04-02T02:00:00.000000000+0200'\n '2012-04-10T02:00:00.000000000+0200' '2012-04-17T02:00:00.000000000+0200'\n '2012-04-26T02:00:00.000000000+0200' '2012-04-30T02:00:00.000000000+0200'\n '2012-05-03T02:00:00.000000000+0200' '2012-05-07T02:00:00.000000000+0200'\n '2012-05-10T02:00:00.000000000+0200' '2012-05-14T02:00:00.000000000+0200'\n '2012-05-22T02:00:00.000000000+0200' '2012-06-05T02:00:00.000000000+0200'\n '2012-06-19T02:00:00.000000000+0200' '2012-07-03T02:00:00.000000000+0200'\n '2012-07-17T02:00:00.000000000+0200' '2012-07-31T02:00:00.000000000+0200'\n '2012-08-14T02:00:00.000000000+0200' '2012-08-28T02:00:00.000000000+0200'\n '2012-09-11T02:00:00.000000000+0200' '2012-09-25T02:00:00.000000000+0200'\n '2012-10-10T02:00:00.000000000+0200' '2012-10-24T02:00:00.000000000+0200'\n '2012-11-21T01:00:00.000000000+0100' '2012-12-18T01:00:00.000000000+0100'] not in index"
You are close, but you actually don't need to iterate over the rows when using the isnull() functions. by default
df[(df['DOC_mg/L'].isnull() == False) & (df['TOC_mg/L'].isnull() == True)].index
Will return just the index of the rows where DOC_mg/L is not null and TOC_mg/L is null.
Now you can do something like this to set the values for TOC_mg/L:
null_index = df[(df['DOC_mg/L'].isnull() == False) & \
(df['TOC_mg/L'].isnull() == True)].index
df['TOC_mg/L'][null_index] = df['DOC_mg/L'][null_index] # EDIT To switch the index position.
This will use the index of the rows where TOC_mg/L is null and DOC_mg/L is not null, and set the values for TOC_mg/L to the those found in DOC_mg/L in the same rows.
Note: This is not the accepted way for setting values using an index, but it is how I've been doing it for some time. Just make sure that when setting values, the left side of the equation is df['col_name'][index]. If col_name and index are switched you will set the values to a copy which is never set back to the original.
Now to set the mean, you can create a new column, we'll call this Mean_mg/L and set the value = 0.0. Then set this new column to the mean of both columns:
# Insert a new col at the end of the dataframe columns name 'Mean_mg/L'
# with default value 0.0
df.insert(len(df.columns), 'Mean_mg/L', 0.0)
# Set this columns value to the average of DOC_mg/L and TOC_mg/L
df['Mean_mg/L'] = (df['DOC_mg/L'] + df['TOC_mg/L']) / 2
In the columns where we filled null values with the corresponding column value, the average will be the same as the values.