read two columns in Excel using python - python-2.7

I want to write a python script which should read xlsx file and based on value of column X, it should write/append file with the value of column Z.
Sample data:
Column A Column X Column Y Column Z
123 abc test value 1
124 xyz test value 2
125 xyz test value 3
126 abc test value 4
If value in Column X = abc then it should create a file (if not existing already) in some path with name abc.txt and insert the value of column Z in abc.txt file, likewise if Column X = xyz then it should create a file in same path with xyz.txt and insert the value of column Z in xyz.txt file.
from openpyxl import load_workbook
wb = load_workbook('filename.xlsm')
ws = wb.active
for cell in ws.columns[9]: #here column 9 is value is what i am testing which is Column X of my example.
if cell.value == "abc":
print ws.cell(column=12).value #this is not working and i dont know how to read corresponding value of another column
Please suggest what could be done.
Thank you.

Change
print ws.cell(column=12).value
By:
print ws.columns[col][row].value
in your case:
print ws.columns[12-1][cell.row-1].value
Note that if you use this indexation method cols and rows start with index 0. This is why I'm doing cell.row-1, so take it into account when you address your column, if your 12 starts counting from 1 you'll have to address to 11.
Alternatively you can access to your information cell like this: ws.cell(row = cell.row, column = 12).value. Note in this case cols and rows start at 1.

Related

vlookup between two worksheets in VBA

i have tried alot doing vlookup between two workbooks, one is open (book1) and the other
needs to be opened by GetOpenFilename() function let us call it (book2).
(book1) contains sheet1 where i want to do the vlookup
(book2) contains sheet2 where i want to match the id (column 1) and get the names (column2) by vlookup.
here is my code. Notice that the number of rows changes every month so, i need to loop until the last row in sheet1(in book1).
Sub status_first_Step()
Range("C:C").Insert
Set book1 = Sheets("Sheet1")
book2 = Application.GetOpenFilename()
Workbooks.Open book2
Set book2 = ActiveWorkbook
For i = 1 To Cells(Rows.Count, "A").End(xlUp).Row
ws1.Cells(i, "C").Value = Application.VLookup(book1.Cells(i, 1).Value, book2.Sheets("sheet2").Columns("A:Y"), 2, 0)
Next i
End Sub
Thank You.

How to find a match between any partial value in column A and any value column B? And extract?

I have two columns in a pandas DataFrame, both containing also a lot of null values. Some values in column B, exist partially in a field (or multiple fields) in columns A. I want to check if this value of B exists in A, and if so, seperate this value and add as a new row in column A
Example:
Column A | Column B
black bear | null
black box | null
red fox | null
red fire | null
green tree | null
null | red
null | yellow
null | black
null | red
null | green
And I want the following:
Column A
black
bear
box
red
fire
fox
yellow
green
Does anyone have any tips on how to get this result? I have tried using regex (re.match), but I am struggling with the fact that I do not have a fixed pattern but a variable (namely, any value in column B) This is my effort:
import re
list_A= df['Column A'].values.tolist()
list_B= df['Column B'].values.tolist()
for i in list_A:
for j in list_B:
if i != None:
if re.match('{j}.+', i) :
...
Note: the columns are over 2500 rows long.
If I understand your question correctly that you want to split the value of b from the value in a whenever b is found in a, and then stored the separated values separately, then how about trying the following?
import re
list_A = df['Column A'].values.tolist()
list_B = df['Column B'].values.tolist()
list_of_separated_values = []
for a in list_a:
for b in list_b:
if b in a:
list_of_separated_values.extend([val for val in re.split('({})'.format(b),a) if not val])
This is not a regex question. You have your data in a dataframe, use the dataframe functionality to fix it.
Assuming data_frame is your pandas DataFrame.
# filter the DataFrame to just those with `null` in Column A
filtered = data_frame[data_frame["Column A"].isnull()]
# in the filtered table, assign Column B to Column A
filtered["Column A"] = filtered["Column B"]
# set Column B to null/None (I'm assuming you want this or this step can be skipped)
filtered["Column B"] = None
print(data_frame)

Read multiple excel sheets on specific column and right them in one csv file using python

I have multiple sheets in one excel file like Sheet1, Sheet2, Sheet3,etc. Now I have to list all the particular column in one csv file. Both the sheets has one unique column "Attribute" and only those records should be listed in the csv file line by line. (First sheet's 'Attribute' values should be in 1st line and 2nd sheet's 'Attribute' values should be in 2nd line and etc.,)
If instances,
Sheet1:
Attribute,Order
P,1
Emp_ID,2
DOJ,3
Name,4
Sheet2:
Attribute,Order
C,1
Emp_ID,2
Exp,3
LWD,4
Expected result: (In some .csv file)
P,Emp_ID,DOJ,name
C,Emp_ID,Exp,LWD
Note: Line starting from P should be in first line and C should be in 2nd line and etc.,
Below is my code:
import pandas as pd
excel = 'E:\Python Utility\Inbound.xlsx'
K = 'E:\Python Utility\Headers_Files\All_Header.csv'
df = pd.read_excel(excel,sheet_name = None)
data = pd.DataFrame(df,columns=['Attribute']).T
print data
M = data.to_csv(K, encoding='utf-8',index=False,header=False)
print 'done'
Output show's as below:
Empty DataFrame Columns: [] Index: [Attribute] done
If I use sheet_name = 'sheet1' then DataFrame works good and data loaded as expected in csv file.
Thanks in advance

Python - webscraping; dictionary data structure

I need to scrape this website (http://setkab.go.id/profil-kabinet/#) and produce an Excel file that has headers "Cabinet names" in column 1 and "Era" in column 2. That means each Cabinet name (e.g. Kabinet Presidensil, Kabinet Sjahrir I) should have its own row - alongside its respective era (e.g. Era Revolusi Fisik, Era Republik Indonesia Serikat).
This is the closest I've gotten:
import requests
from bs4 import BeautifulSoup
response = requests.get('http://setkab.go.id/profil-kabinet/#')
soup = BeautifulSoup(response.text, 'html.parser')
eras = soup.find_all('div', attrs={'class':"wpb_accordion_section group"})
setkab = {}
for element in eras:
setkab[element.a.get_text()] = {}
for element in eras:
cabname = element.find('div',attrs={'class':'wpb_wrapper'}).get_text()
setkab[element.a.get_text()]['cbnm'] = cabname
for item in setkab.keys():
print item + setkab[item]['cbnm']
import os, csv
os.chdir("/Users/mxcodes/Code")
with open("setkabfinal.csv", "w") as toWrite:
writer = csv.writer(toWrite, delimiter=",")
writer.writerow(["Era", "Cabinet name"])
for a in setkab.keys():
writer.writerow([a.encode("utf-8"), setkab[a]["cbnm"]])
However, this creates an Excel file with the headers "Era" and "Cabinet names" in column 1 and 2, respectively. It fails to put each Cabinet name in a separate row. For example, it has 'Era Revolusi Fisik' in column 1 and lists all the cabinets together in column 2.
My guess is that I need to switch the key-value pairs somehow so that each Cabinet becomes a key and its era becomes its value - because currently it's the other way around. But I've tried and failed to do so. Any help? Thank you!
From what I can see, the cabinets[a]["cbnm"] variable you use for writing is just a long Unicode so when you do writer.writerow([a.encode("utf-8"), cabinets[a]["cbnm"]]) what actually happens is that you write the era at the first column and the whole Unicode in the single cell in the next column (even if you have \n in your string it does not prevent it from being writed in a single cell (csv actually think that you want the unicode to be in ONLY one cell so it puts " before and after the cabinets[a]["cbnm"] value to be sure it will actually be in one cell)), what you should do to write every cabinet value in another row is to use the writerow method separately for each desired row.
for example this code worked fine for me:
cabinets = setkab
with open("cabinets.csv", "w") as toWrite:
writer = csv.writer(toWrite, delimiter=",")
writer.writerow(["Era", "Cabinet name"])
for a in setkab.keys():
writer.writerow([a.encode("utf-8")]) #write the era column
cabinets_list = [i for i in cabinets[a]["cbnm"].split('\n') if i != ''] #get all the values that are separated by newline chars (if they aren't empty strings)
for i in cabinets_list: writer.writerow([a.encode("utf-8"),i]) #write every value separately in the CABINET NAME row
as you can see I changed only the last 3 lines.
I hope this will help you!

Extracting columnar data correctly as it is in the file

Suppose i have tabular column as below.Now i want to extract the column wise data.I tried extracting data by creating a list.But it is extracting the first row correctly but from second row onwards there is space i.e under CEN/4.Now my code considers zeroth column has 5.0001e-1 form second row,it starts reading from there. How to extract the data correctly coulmn wise.output is scrambled.
0 1 25 CEN/4 -5.000000E-01 -3.607026E+04 -5.747796E+03 -8.912796E+02 -88.3178
5.000000E-01 3.607026E+04 5.747796E+03 8.912796E+02 1.6822
27 -5.000000E-01 -3.641444E+04 -5.783247E+03 -8.912796E+02 -88.3347
5.000000E-01 3.641444E+04 5.783247E+03 8.912796E+02 1.6653
28 -5.000000E-01 -3.641444E+04 -5.712346E+03 -8.912796E+02 -88.3386
5.000000E-01 3.641444E+04 5.712346E+03 8.912796E+02
my code is :
f1=open('newdata1.txt','w')
L = []
for index, line in enumerate(open('Trial_1.txt','r')):
#print index
if index < 0: #skip first 5 lines
continue
else:
line =line.split()
L.append('%s\t%s\t %s\n' %(line[0], line[1],line[2]))
f1.writelines(L)
f1.close()
my output looks like this:
0 1 CEN/4 -5.000000E-01 -5.120107E+04
5.000000E-01 5.120107E+04 1.028093E+04 5.979930E+03 8.1461
i want columnar data as it is in the file.How to do that.I am a bgeinner
its hard to tell from the way the input data is presented in your question, but Im guessing your file is using tabs to separate columns, in any case, consider using python csv module with the relevant delimiter like:
import csv
with open('input.csv') as f_in, open('newdata1', 'w') as f_out:
reader = csv.reader(f_in, delimiter='\t')
writer = csv.writer(f_out, delimiter='\t')
for row in reader:
writer.writerow(row)
see python csv module documentation for further details