Storing Multiple Values in an Array into Excel using Pandas - python-2.7

I'm having issues with writing values in an Array into excel column. At the moment I can only print the first data set within the array meaning:
within Excel:
COL:A
Mazda3
COL:B
Civic
COL:C
Corolla
COL:D
Altima
tmpList.extend([mazda, honda, Toyota, Nissan])
df = pd.DataFrame(tmpList)
df = df.transpose()
xlsfile = 'pandle.xlsx'
writer = pd.ExcelWriter(xlsfile, engine='xlsxwriter')
df.to_excel(writer, sheet_name="Sheet1",startrow=1, startcol=1, header=False, index=False)
writer.save()
Ideally I want:
COL:A
Mazda3, CX7, CX5
COL:B
Civic, Accord, Pilot
COL:C
Corolla, Camry, Sienna
COL:D
Altima, Pathfinder, Maxima
//EDIT
So i'm able to write multiple data set, but it's the inverse, it's printing everything by column and not row as I would prefer.
xlsfile = 'pandle2.xlsx'
writer = pd.ExcelWriter(xlsfile, engine='xlsxwriter')
#df = pd.DataFrame(mylist, columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'])
df = pd.DataFrame(mylist, columns=['1A', '2B', '3C', '4D', '5E', '6F', '7G', '8H', '9I'])
df = df.transpose()
df.to_excel(writer, sheet_name="Sheet1",startrow=1, startcol=1, header=False, index=False)
writer.save()

I figured it out!
mylist.append([x,y,z,])
df1 = pd.DataFrame(mylist)
basically pass your array into pandas module. in this example you have to import pandas as pd if you want to use this sample.

Related

CSV reader putting /n after each row

I have generated a CSV file from excel.
I am trying to read this CSV file using python CSV. However after each row I get /n. How to remove this /n.
Here is my code:
with open('/users/ps/downloads/test.csv','rU') as csvfile
spamreader = csv.reader(csvfile,dialect=csv.excel_tab)
a = []
for row in csvfile:
a.append(row)
print a
I get result like this:
['HEADER\n', 'a\n', 'b\n', 'c\n', 'd\n', 'e']
I want to have results like this:
['HEADER', 'a', 'b', 'c', 'd', 'e']
you could try a replace
a.replace('\n','')
edit:
working verison- a.append(row.replace('\n',''))
You can use strip
x = ['HEADER\n', 'a\n', 'b\n', 'c\n', 'd\n', 'e']
In [6]: def f(word):
...: return word.strip()
...:
In [7]: map(f, x)
Out[7]: ['HEADER', 'a', 'b', 'c', 'd', 'e']
In [8]:

How merge dictionary with key values but which contains several different list values?

Someone, asked how my input looks like:
The input is an ouput from preceeding function.
And when I do
print(H1_dict)
The following information is printed to the screen:
defaultdict(<class 'list'>, {2480: ['A', 'C', 'C'], 2651: ['T', 'A', 'G']})
which means the data type is defaultdict with (keys, values) as (class, list)
So something like this:
H1dict = {2480: ['A', 'C', 'C'], 2651: ['T', 'A', 'G'].....}
H2dict = {2480: ['C', 'T', 'T'], 2651: ['C', 'C', 'A'].....}
H1_p1_values = {2480: ['0.25', '0.1', '0.083'], 2651: ['0.43', '0.11', '0.23']....}
H1_p2_values = {2480: ['0.15', '0.15', '0.6'], 2651: ['0.26', '0.083', '0.23']....}
H2_p1_values = {2480: ['0.3', '0.19', '0.5'], 2651: ['0.43', '0.17', '0.083']....}
H2_p2_values = {2480: ['0.3', '0.3', '0.1'], 2651: ['0.39', '0.26', '0.21']....}
I want to merge this dictionaries as:
merged_dict (class, list) or (key, values)= {2480: h1['A', 'C', 'C'], h2 ['C', 'T', 'T'], h1_p1['0.25', '0.1', '0.083'], h1_p2['0.15', '0.15', '0.6'], h2_p1['0.3', '0.19', '0.5'], h2_p2['0.3', '0.3', '0.1'], 2651: h1['T', 'A', 'G'], h2['C', 'C', 'A']....}
So, I want to merge several dictionaries using key values but maintain the order in which different dictionary are supplied.
For merging the dictionary I am able to do it partially using:
merged = [haplotype_A, haplotype_B, hapA_freq_My, hapB_freq_My....]
merged_dict = {}
for k in haplotype_A.__iter__():
merged_dict[k] = tuple(merged_dict[k] for merged_dict in merged)
But, I want to add next level of keys infront of each list, so I can access specific items in a large file when needed.
Downstream I want to access the values inside this merged dictionary using keys each time with for-loop. Something like:
for k, v in merged_dict:
h1_p1sum = sum(float(x) for float in v[index] or v[h1_p1])
h1_p1_prod = mul(float(x) for float in v[index] or v[h1_p1])
h1_string = "-".join(str(x) for x in v[h1_index_level]
and the ability to print or write it to the file line by line
print (h1_string)
print (h1_p1_sum)
I am read several examples from defaultdict and other dict but not able to wrap my head around the process. I have been able to do simple operation but something like this seems a little complicated. I would really appreciate any explanation that you may add to the each step of the process.
Thank you in advance !
If I understand you correctly, you want this:
merged = {'h1': haplotype_A, 'h2': haplotype_B, 'h3': hapA_freq_My, ...}
merged_dict = defaultdict(dict)
for var_name in merged:
for k in merged[var_name]:
merged_dict[k][var_name] = merged[var_name][k]
This should give you an output of:
>>>merged_dict
{'2480': {'h1': ['A', 'C', 'C'], 'h2': ['C', 'T', 'T'], ..}, '2651': {...}}
given of course, the variables are the same as your example data given.
You can access them via nested for loops:
for k in merged_dict:
for sub_key in merged_dict[k]:
print(merged_dict[k][sub_key]) # print entire list
for item in merged[k][sub_key]:
print(item) # prints item in list

identifying which rows are present in another dataframe

I have two dataframes df1 and df2, which I'm told share some rows. That is, for some indices, (i,j)_n df1.loc[i] == df2.loc[j] exactly. I would like to find this correspondence.
This has been a tricky problem to track down. I don't want to "manually" inquire about each of the columns for each of the rows, so I've been searching for something cleaner.
This is the best I have but it's not fast. I'm hoping some guru can point me in the right direction.
matching_idx=[]
for ix in df1.index:
match =df1.loc[ix:ix].to_dict(orient='list')
matching_idx.append( df2.isin(match).all(axis=1) )
It would be nice to get rid of the for loop but I'm not sure it's possible.
Assuming the rows in each dataframes are unique, you can concatenate the two dataframes and search for duplicates.
df1 = pd.DataFrame({'A': ['a', 'b'], 'B': ['a', 'c']})
df2 = pd.DataFrame({'A': ['c', 'a'], 'B': ['c', 'a']})
>>> df1
A B
0 a a
1 b c
>>> df2
A B
0 c c
1 a a
df = pd.concat([df1, df2])
# Returns the index values of duplicates in `df2`.
>>> df[df.duplicated()]
A B
1 a a
# Returns the index value of duplicates in `df1`.
>>> df[df.duplicated(keep='last')]
A B
0 a a
You can do a merge that joins on all columns:
match = df1.merge(df2, on=list(df1.columns))

Dictionary Key Error

I am trying to construct a dictionary with values from a csv file.Say 10 columns there and i want to set the first column as key and the remaining as Values.
If setting as a for loop the dictionary has to have only one value. Kindly Suggest me a way.
import csv
import numpy
aname = {}
#loading the file in numpy
result=numpy.array(list(csv.reader(open('somefile',"rb"),delimiter=','))).astype('string')
#devolop a dict\
r = {aname[rows[0]]: rows[1:] for rows in result}
print r[0]
Error as follows.
r = {aname[rows[0]]: rows[1:] for rows in result}
KeyError: '2a9ac84c-3315-5576-4dfd-8bc34072360d|11937055'
I'm not entirely sure what you mean to do here, but does this help:
>>> result = [[1, 'a', 'b'], [2, 'c', 'd']]
>>> dict([(row[0], row[1:]) for row in result])
{1: ['a', 'b'], 2: ['c', 'd']}

How to write the retrieved DB from MS SQL server into new CSV File with headers using python 2.7.6

I am trying to view the database retrieved from ms SQL server in CSV file using python with headers(column names)and without any braces and quotes. My code is as follows:
import csv
import pyodbc
outpath="path\\test1.csv"
output = open(outpath, "w")
cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER=SIPLDT0115;DATABASE=First_eg;UID=sa;PWD=wisdom')
cursor = cnxn.cursor()
sql = "select * from First1"
cursor.execute(sql)
rows = cursor.fetchall()
desc = cursor.description
header = (desc[0][0], desc[1][0], desc[2][0], desc[3][0], desc[4][0])
print "%s %3s %s %3s %3s" % header
for row in rows:
print row
value = str(row).strip('(')
output.write(str(value.replace(')','\n')))
output.close()
f = open("path\\test1.csv").read()
print f
OUTPUT:
F_Name L_Name S_ID Branch Course
('jash', 'u', 123, 'C', 'B')
('jash', 'u', 123, 'C', 'B')
('jash', 'u', 123, 'C', 'B')
'jash', 'u', 123, 'C', 'B'
'jash', 'u', 123, 'C', 'B'
'jash', 'u', 123, 'C', 'B'
In the csv file, it was coming without headers.
I want to view the database as like a table in csv file with header. How? Is it possible? please reply yar!!