Printing Results from Loops - python-2.7

I currently have a piece of code that works in two segments. The first segment opens the existing text file from a specific path on my local drive and then arranges, based on certain indices, into a list of sub list. In the second segment I take the sub-lists I have created and group them on a similar index to simplify them (starts at def merge_subs). I am getting no error code but I am not receiving a result when I try to print the variable answer. Am I not correctly looping the original list of sub-lists? Ultimately I would like to have a variable that contains the final product from these loops so that I may write the contents of it to a new text file. Here is the code I am working with:
from itertools import groupby, chain
from operator import itemgetter
with open ("somepathname") as g:
# reads text from lines and turns them into a list sub-lists
lines = g.readlines()
for line in lines:
matrix = line.split()
JD = matrix [2]
minTime= matrix [5]
maxTime= matrix [7]
newLists = [JD,minTime,maxTime]
L = newLists
def merge_subs(L):
dates = {}
for sub in L:
date = sub[0]
if date not in dates:
dates[date] = []
dates[date].extend(sub[1:])
answer = []
for date in sorted(dates):
answer.append([date] + dates[date])
new code
def openfile(self):
filename = askopenfilename(parent=root)
self.lines = open(filename)
def simplify(self):
g = self.lines.readlines()
for line in g:
matrix = line.split()
JD = matrix[2]
minTime = matrix[5]
maxTime = matrix[7]
self.newLists = [JD, minTime, maxTime]
print(self.newLists)
dates = {}
for sub in self.newLists:
date = sub[0]
if date not in dates:
dates[date] = []
dates[date].extend(sub[1:])
answer = []
for date in sorted(dates):
print(answer.append([date] + dates[date]))
enter code here
enter code here

Related

Concatenate two data frames to a data frame of square matrix

I have two pandas dataframes of which shapes are "n x n" and "m x n" (m < n). For example:
df1 = pd.DataFrame([[0,1,0,1],[1,0,0,1],[0,0,0,1],[1,1,1,0]])
df2 = pd.DataFrame([[1,1,1,0],[1,1,0,1]])
I'd like to get the dataframe of a square matrix by concatenating above dataframes:
df3 = foo(df1, df2)
print df3.values
This should print like the following matrix.
[[0,1,0,1,1,1],
[1,0,0,1,1,1],
[0,0,0,1,1,0],
[1,1,1,0,0,1],
[1,1,1,0,0,0],
[1,1,0,1,0,0]]
The logic of concatination is like this:
the upper-left part of the square matrix comes from df1
the upper-right part of it comes from the transpose of df2
the bottom-left part of it comes from df2
all element of the rest of it (bottom-right part) is zero.
How do I implement the above logic (foo method)?
Here is a sample of foo:
def foo(_df1,_df2):
df1 = _df1.reset_index(drop=True) #to make sure the index is ordered
df2 = _df2.reset_index(drop=True) #to make sure the index is ordered
df2_transpose = df2.transpose().reset_index(drop=True) #reset the index to match the join below
df_upper = df1.join(df2_transpose,rsuffix="_") #add suffix for additional columns
df_upper.columns = [i for i in range(df_upper.shape[1])] #reset column names to int
df = pd.concat([df_upper,df2]) #fill the bottom left
df.fillna(0,inplace=True) #fill with 0 the bottom right
return df
The foo function:
def foo(df1_data,df2_data):
df_test = pd.concat([df1_data,df2_data])
a = np.concatenate((df2_data.values.T,np.zeros(shape = (df_test.values.shape[0] - df_test.values.shape[1],df2_data.values.shape[0]))))
final_array = np.append(df_test.values,a, axis=1).astype(int)
df3_data = pd.DataFrame(final_array)
return df3_data
df3 = foo(df1,df2)

Python - reading text file delimited by semicolon, ploting chart using openpyxl

I have copied the text file to excel sheet separating cells by ; delimiter.
I need to plot a chart using the same file which I achieved. Since all the values copied are type=str my chart gives me wrong points.
Please suggest to overcome this. Plot is should be made of int values
from datetime import date
from openpyxl import Workbook,load_workbook
from openpyxl.chart import (
LineChart,
Reference,
Series,
)
from openpyxl.chart.axis import DateAxis
excelfile = "C:\Users\lenovo\Desktop\how\openpychart.xlsx"
wb = Workbook()
ws = wb.active
f = open("C:\Users\lenovo\Desktop\sample.txt")
data = []
num = f.readlines()
for line in num:
line = line.split(";")
ws.append(line)
f.close()
wb.save(excelfile)
wb.close()
wb = load_workbook(excelfile, data_only=True)
ws = wb.active
c1 = LineChart()
c1.title = "Line Chart"
##c1.style = 13
c1.y_axis.title = 'Size'
c1.x_axis.title = 'Test Number'
data = Reference(ws, min_col=6, min_row=2, max_col=6, max_row=31)
series = Series(data, title='4th average')
c1.append(series)
data = Reference(ws, min_col=7, min_row=2, max_col=7, max_row=31)
series = Series(data, title='Defined Capacity')
c1.append(series)
##c1.add_data(data, titles_from_data=True)
# Style the lines
s1 = c1.series[0]
s1.marker.symbol = "triangle"
s1.marker.graphicalProperties.solidFill = "FF0000" # Marker filling
s1.marker.graphicalProperties.line.solidFill = "FF0000" # Marker outline
s1.graphicalProperties.line.noFill = True
s2 = c1.series[1]
s2.graphicalProperties.line.solidFill = "00AAAA"
s2.graphicalProperties.line.dashStyle = "sysDot"
s2.graphicalProperties.line.width = 100050 # width in EMUs
##s2 = c1.series[2]
##s2.smooth = True # Make the line smooth
ws.add_chart(c1, "A10")
##
##from copy import deepcopy
##stacked = deepcopy(c1)
##stacked.grouping = "stacked"
##stacked.title = "Stacked Line Chart"
##ws.add_chart(stacked, "A27")
##
##percent_stacked = deepcopy(c1)
##percent_stacked.grouping = "percentStacked"
##percent_stacked.title = "Percent Stacked Line Chart"
##ws.add_chart(percent_stacked, "A44")
##
### Chart with date axis
##c2 = LineChart()
##c2.title = "Date Axis"
##c2.style = 12
##c2.y_axis.title = "Size"
##c2.y_axis.crossAx = 500
##c2.x_axis = DateAxis(crossAx=100)
##c2.x_axis.number_format = 'd-mmm'
##c2.x_axis.majorTimeUnit = "days"
##c2.x_axis.title = "Date"
##
##c2.add_data(data, titles_from_data=True)
##dates = Reference(ws, min_col=1, min_row=2, max_row=7)
##c2.set_categories(dates)
##
##ws.add_chart(c2, "A61")
### setup and append the first series
##values = Reference(ws, (1, 1), (9, 1))
##series = Series(values, title="First series of values")
##chart.append(series)
##
### setup and append the second series
##values = Reference(ws, (1, 2), (9, 2))
##series = Series(values, title="Second series of values")
##chart.append(series)
##
##ws.add_chart(chart)
wb.save(excelfile)
wb.close()
I have modified below code in for loop and it worked.
f = open("C:\Users\lenovo\Desktop\sample.txt")
data = []
num = f.readlines()
for line in num:
line = line.split(";")
new_line=[]
for x in line:
if x.isdigit():
x=int(x)
new_line.append(x)
else:
new_line.append(x)
ws.append(new_line)
f.close()
wb.save(excelfile)
wb.close()
For each list,for each value check if its a digit, if yes converts to integer and store in another list.
Using x=map(int,x) didnt work since I have character values too.
I felt above is much more easy than using x=map(int,x) with try and Except
Thanks
Basha

Memory Error when exporting data to csv file

Hello I was hoping someone could help me with my college coursework, I have an issue with my code. I keep running into a memory error with my data export.
Is there any way I can reduce the memory that is being used or is there a different approach I can take?
For the course work I am given a file of 300 records about customer orders from a CSV file and then I have to export the Friday records to a new CSV file. Also I am required to print the most popular method for customer's orders and the total money raised from the orders but I have an easy plan for that.
This is my first time working with CSV so I'm not sure how to do it. When I run the program it tends to crash instantly or stop responding. Once it appeared with 'MEMORY ERROR' however that is all it appeared with. I'm using a college provided computer so I am not sure on the exact specs but I know it runs 4GB of memory.
defining count occurences predefined function
def countOccurences(target,array):
counter = 0
for element in array:
if element == target:
counter= counter + 1
print counter
return counter
creating user defined functions for the program
dataInput function used for collecting data from provided file
def dataInput():
import csv
recordArray = []
customerArray = []
f = open('E:\Portable Python 2.7.6.1\Choral Shield Data File(CSV).csv')
csv_f = csv.reader(f)
for row in csv_f:
customerArray.append(row[0])
ticketID = row[1]
day, area = datasplit(ticketID)
customerArray.append(day)
customerArray.append(area)
customerArray.append(row[2])
customerArray.append(row[3])
recordArray.append(customerArray)
f.close
return recordArray
def datasplit(variable):
day = variable[0]
area = variable[1]
return day,area
def dataProcessing(recordArray):
methodArray = []
wed_thursCost = 5
friCost = 10
record = 0
while record < 300:
method = recordArray[record][4]
methodArray.append(method)
record = record+1
school = countOccurences('S',methodArray)
website = countOccurences('W',methodArray)
if school > website:
school = True
elif school < website:
website = True
dayArray = []
record = 0
while record < 300:
day = recordArray[record][1]
dayArray.append(day)
record = record + 1
fridays = countOccurences('F',dayArray)
wednesdays = countOccurences('W',dayArray)
thursdays = countOccurences('T', dayArray)
totalFriCost = fridays * friCost
totalWedCost = wednesdays * wed_thursCost
totalThurCost = thursdays * wed_thursCost
totalCost = totalFriCost + totalWedCost + totalThurCost
return totalCost,school,website
My first attempt to writing to a csv file
def dataExport(recordArray):
import csv
fridayRecords = []
record = 0
customerIDArray = []
ticketIDArray = []
numberArray = []
methodArray = []
record = 0
while record < 300:
if recordArray[record][1] == 'F':
fridayRecords.append(recordArray[record])
record = record + 1
with open('\Courswork output.csv',"wb") as f:
writer = csv.writer(f)
for record in fridayRecords:
writer.writerows(fridayRecords)
f.close
My second attempt at writing to the CSV file
def write_file(recordArray): # write selected records to a new csv file
CustomerID = []
TicketID = []
Number = []
Method = []
counter = 0
while counter < 300:
if recordArray[counter][2] == 'F':
CustomerID.append(recordArray[counter][0])
TicketID.append(recordArray[counter][1]+recordArray[counter[2]])
Number.append(recordArray[counter][3])
Method.append(recordArray[counter][4])
fridayRecords = [] # a list to contain the lists before writing to file
for x in range(len(CustomerID)):
one_record = CustomerID[x],TicketID[x],Number[x],Method[x]
fridayRecords.append(one_record)
#open file for writing
with open("sample_output.csv", "wb") as f:
#create the csv writer object
writer = csv.writer(f)
#write one row (item) of data at a time
writer.writerows(recordArray)
f.close
counter = counter + 1
#Main Program
recordArray = dataInput()
totalCost,school,website = dataProcessing(recordArray)
write_file(recordArray)
In the function write_file(recordArray) in your second attempt the counter variable counter in the first while loop is never updated so the loop continues for ever until you run out of memory.

Python - Convert dictionary (having "list" as values) into csv file

Trying to write below dictionary into csv file with desired output as mentioned below.
dict_data = {"1":["xyz"],
"2":["abc","def"],
"3":["zzz"]
}
desired output:
1,3,2
xyz,zzz,abc
def
Below code doesn't work as expected as it keeps both "abc" & "def" in same cell as shown below.
with open('k.csv','wb') as out_file:
writer = csv.writer(out_file,dialect = 'excel')
headers = [k for k in dict_data]
items = [dict_data[k] for k in dict_data]
writer.writerow(headers)
writer.writerow(items)
output:
1,3,2
xyz,zzz,abc,def
Here is the complete solution:
import csv
import os
class CsvfileWriter:
'''
Takes dictionary as input and writes items into a CSV file.
For ex:-
Input dictionary:
dict_data = {"1":["xyz"],"2":["abc","def"],"3":["zzz"]}
Output: (CSV file)
1,3,2
xyz,zzz,abc
,,def
'''
def __init__(self,dictInput,maxLength=0):
'''
Creates a instance with following variables.
dictInput & maxLength
dictInput -> dictionary having values(list) of same length
ex:-
dict_data = {"1":["xyz",""],"2":["abc","def"],"3":["zzz",""]}
maxLength -> length of the list
'''
self.dictInput = dictInput
self.maxLength = maxLength
#classmethod
def list_padding(cls,dictInput):
'''
converts input dictionary having list (as values) of varying lenghts into constant length.
Also returns class variables dictInput & maxLength
Note:
dictInput represents the dictionary after padding is applied.
maxLength represents the length of the list(values in dictionary) having maximum number of items.
Ex:-
input dictionary:
dict_data = {"1":["xyz"],"2":["abc","def"],"3":["zzz"]}
output dictionary:
dict_data = {"1":["xyz",""],"2":["abc","def"],"3":["zzz",""]}
'''
cls.dictInput = dictInput
listValues = dictInput.values()
listValues.sort(key = lambda i: len(i))
maxLength = len(listValues[-1])
for i in listValues:
while(len(i) < maxLength):
i.append('')
return cls(dictInput,maxLength)
def write_to_csv(self):
with open('sample_file.csv','wb') as out_file:
writer = csv.writer(out_file,dialect = 'excel')
headers = [k for k in self.dictInput]
items = [self.dictInput[k] for k in self.dictInput]
writer.writerow(headers)
c = 0
while (c < self.maxLength):
writer.writerow([i[c] for i in items])
c += 1
dict_data = {"1":["xyz"],"2":["abc","def"],"3":["zzz"]}
cf = CsvfileWriter.list_padding(dict_data)
cf.write_to_csv()
The following works in Python 2:
import csv
dict_data = {
"1":["xyz"],
"2":["abc","def"],
"3":["zzz"]
}
def transpose(cols):
return map(lambda *row: list(row), *cols)
with open('k.csv','w') as out_file:
writer = csv.writer(out_file,dialect = 'excel')
headers = dict_data.keys()
items = transpose(dict_data.values())
writer.writerow(headers)
writer.writerows(items)
I can't take credit for the transpose function, which I picked up from here. It turns a list of columns into a list of rows, automatically padding columns that are too short with None. Fortunately, the csv writer outputs blanks for None values, which is exactly what's needed.
(In Python 3, map behaves differently (no padding), so it would require some changes.)
Edit: A replacement transpose function that works for both Python 2 and 3 is:
def transpose(cols):
def mypop(l):
try:
return l.pop(0)
except IndexError:
return ''
while any(cols):
yield [mypop(l) for l in cols]

Importing and analysing text data using Python 2.7

I have created code in Python 2.7 which saves sales data for various products into a text file using the write() method. My limited Python skills have hit the wall with the next step - I need code which can read this data from the text file and then calculate and display the mean average number of sales of each item. The data is stored in the text file like the data shown below (but I am able to format it differently if that would help).
Product A,30
Product B,26
Product C,4
Product A,40
Product B,18
Product A,31
Product B,13
Product C,3
After far too long Googling around this to no avail, any pointers on the best way to manage this would be greatly appreciated. Thanks in advance.
You can read from the file, then split each line by a space (' '). Then, it is just a matter of creating a dictionary, and appending each new item to a list which is the value for each letter key, then using sum and len to get the average.
Example
products = {}
with open("myfile.txt") as product_info:
data = product_info.read().split('\n') #Split by line
for item in data:
_temp = item.split(' ')[1].split(',')
if _temp[0] not in products.keys():
products[_temp[0]] = [_temp[1]]
else:
products[_temp[0]] = products[_temp[0]]+[_temp[1]]
product_list = [[item, float(sum(key))/len(key)] for item, key in d.items()]
product_list.sort(key=lambda x:x[0])
for item in product_list:
print 'The average of {} is {}'.format(item[0], item[1])
from __future__ import division
dict1 = {}
dict2 = {}
file1 = open("input.txt",'r')
for line in file1:
if len(line)>2:
data = line.split(",")
a,b = data[0].strip(),data[1].strip()
if a in dict1:
dict1[a] = dict1[a] + int(b)
else:
dict1[a] = int(b)
if a in dict2:
dict2[a] = dict2[a] + 1
else:
dict2[a] = 1
for k,v in dict1.items():
for m,n in dict2.items():
if k == m:
avg = float(v/n)
print "%s Average is: %0.6f"%(k,float(avg))
Output:
Product A Average is: 33.666667
Product B Average is: 19.000000
Product C Average is: 3.500000