Python shows Memory Error - python-2.7

I tried a very simple python script, which be used to add certain strings in each row, the code is:
import csv
List = []
list = []
csv_reader = csv.reader(open('moz_press_IDE_mdoc.csv'))
i = 0
for row in csv_reader:
List.append(list)
j = 1
for num in row:
tmp = str(i) + ':'
num = tmp + num
j += 1
List[i].append(num)
i += 1
out = open('newCSV.csv', 'w')
csv_writer = csv.writer(out)
csv_writer.writerow(List)
out.close()
It shows the error message:
Traceback (most recent call last):
File "preprocess.py", line 19, in <module>
csv_writer.writerow(List)
MemoryError
Can someone help me with that?

This line
List.append(list)
doesn't do what you think it does. It appends a fresh reference to the existing variable.
Instead, write
List.append([])
because (I think) you want a new empty list, not a new reference to the one you manipulated in the previous loop iteration. Print out List before passing it to csv.writer() to see the difference.
And please don't use list as a variable name, because it masks the builtin type. That makes your code hard to read and may lead to mystifying bugs.

Related

Creating a Dictionary from a while loop (Python)

How can I create a Dictionary from my while loop below? infile reads from a file called input, which has the following content:
min:1,23,62,256
max:24,672,5,23
sum:22,14,2,3,89
P90:23,30,45.23
P70:12,23,24,57,32
infile = open("input.txt", "r")
answers = open("output.txt", "w")
while True:
line = infile.readline()
if not line: break
opType = line[0:3]
numList = (line[4:len(line)])
numList = numList.split(',')
What I'm trying to do is basically 2 lists, one that has the operation name (opType) and the other that has the numbers. From there I want to create a dictionary that looks like this
myDictionary = {
'min': 1,23,62,256,
'max': 24,672,5,23,
'avg': 22,14,2,3,89,
'P90': 23,30,45.23,
'P70': 12,23,24,57,32,
}
The reason for this is that I need to call the operation type to a self-made function, which will then carry out the operation. I'll figure this part out. I currently just need help making the dictionary from the while loop.
I'm using python 2.7
Try the following code.
I believe, you would need the 'sum' also in the dictionary. If not, just add a condition to remove it.
myDictionary = {}
with open('input.txt','r') as f:
for line in f:
x = line.split(':')[1].rstrip().split(',')
for i in xrange(len(x)):
try:
x[i] = int(x[i])
except ValueError:
x[i] = float(x[i])
myDictionary[line.split(':')[0]] = x
print myDictionary

Python3.5.1: Appending to a list from a list doesn't work

What I'm trying to do is to have my python code read a .txt -file with ";" separated values on each line, then separate each value on each line to a list, and finally append these values to a assigned lists.
Here's what I've tried...
pullData= open("example.txt", "r", encoding='utf-8').read()
dataArray = pullData.split('\n')
array_one = []
array_two = []
for eachLine in dataArray:
lineArray = eachLine.split(';')
array_one.append(lineArray[0])
array_two.append(lineArray[1])
This example results in an error:
Traceback (most recent call last):
File "MyPath.py", line 25, in <module>
array_two.append(lineArray[1])
IndexError: list index out of range
The splitting of each line works as it should as printing these lists works just fine; i.e.:
for eachLine in dataArray:
lineArray = eachLine.split(';')
print(lineArray[0])
print(lineArray[1])
...as the above returns what it should.
>>>
RESTART: MyPath.py
Jeff
1009
Bill
771
Any ideas on what the problem could be here...?
P.S. The data (i.e. "example.txt") is something like this:
Jeff;1009;3486;24047
Bill;771;371;3867
Michael;931;2131;3331
Jess;3311;9761;3886
Cathy;571;1301;63668
Perhaps an you have an empty newline at the end of the file. Try:
for eachLine in dataArray:
lineArray = eachLine.split(';')
if len(lineArray) >= 2:
array_one.append(lineArray[0])
array_two.append(lineArray[1])

out of bounds error when using a list as an index

I have two files: one is a single column (call it pred) and has no headers, the other has two columns: ID and IsClick (it has headers). My goal is to use the column ID as an index to pred.
import pandas as pd
import numpy as np
def LinesInFile(path):
with open(path) as f:
for linecount, line in enumerate(f):
pass
f.close()
print 'Found ' + str(linecount) + ' lines'
return linecount
path ='/Users/mas/Documents/workspace/Avito/input/' # path to testing file
submission = path + 'submission1234.csv'
lines = LinesInFile(submission)
lines = LinesInFile(path + 'sampleSubmission.csv')
sample = pd.read_csv(path + 'sampleSubmission.csv')
preds = np.array(pd.read_csv(submission, header = None))
index = sample.ID.values - 1
print index
print len(index)
sample['IsClick'] = preds[index]
sample.to_csv('submission.csv', index=False)
The output is:
Found 7816360 lines
Found 7816361 lines
[ 0 4 5 ..., 15961507 15961508 15961511]
7816361
Traceback (most recent call last):
File "/Users/mas/Documents/workspace/Avito/July3b.py", line 23, in <module>
sample['IsClick'] = preds[index]
IndexError: index 7816362 is out of bounds for axis 0 with size 7816361
there seems something wrong because my file has 7816361 lines counting the header while my list has an extra element (len of list 7816361)
I don't have your csv files to recreate the problem, but the problem looks like it is being caused by your use of index.
index = sample.ID.values - 1 is taking each of your sample ID's and subtracting 1. These are not index values in pred as it is only 7816360 long. Each of the last 3 items in your index array (based on your print output) would go out of bounds as they are >7816360. I suspect the error is showing you the first of your ID-1 that go out of bounds.
Assuming you just want to join the files based on their line number you could do the following:
sample=pd.concat((pd.read_csv(path + 'sampleSubmission.csv'),pd.read_csv(submission, header = None).rename(columns={0:'IsClick'})),axis=1)
Otherwise you'll need to perform a join or merge on your two dataframes.

Sequential Search: Input Length Isn't Detected Python

I'm writing a program where the user inputs a list of numbers, and then is asked which number he or she wants the program to return that numbers position. (Ex. 3,5,1,9,12,6 --> find position in list where 9 occurs)
I can get this to work if I hard code the list and the search number, but I'm having trouble with input. Mostly my problem is that Python isn't detecting the length of the list of numbers but I'm not sure how to fix this.
Here is the code I have:
def List(line):
list = []
for e in line.split(','):
list.append(int(e))
def Search(num, list):
for i in range(len(list)):
if list[i] == num:
return i
return -1
def main():
line = input("Enter list of numbers separated by commas: ")
p = input("Number searching for")
print(List(line))
a = Search(p, list)
print(a)
main()
And here's the error:
Traceback (most recent call last):
File "G:\Final Exam Practice\linearsearch.py", line 24, in <module>
main()
File "G:\Final Exam Practice\linearsearch.py", line 19, in main
a = Search(p, list)
File "G:\Final Exam Practice\linearsearch.py", line 7, in Search
for i in range(len(list)):
TypeError: object of type 'type' has no len()
First, this answer has something you could use. list.index is a class method that returns the index of a list item:
>>> mylist = [3,5,1,9,12,6]
>>> mylist.index(9)
3
Next, a TypeError is raised because list is one of Python's reserved keywords. If you want to pass a list to your Search function, don't name it 'list'.
Just changing the name of the 'list' variable in the function will solve your problem. Additionally, here's another way to define search (Python function names are usually lowercase):
def search(num, mylist):
try:
return mylist.index(num)
except:
return -1

IndexError, but more likely I/O error

Unsure of why I am getting this error. I'm reading from a file called columns_unsorted.txt, then trying to write to columns_unsorted.txt. There error is on fan_on = string_j[1], saying list index out of range. Here's my code:
#!/usr/bin/python
import fileinput
import collections
# open document to record results into
j = open('./columns_unsorted.txt', 'r')
# note this is a file of rows of space-delimited date in the format <1384055277275353 0 0 0 1 0 0 0 0 22:47:57> on each row, the first term being unix times, the last human time, the middle binary indicating which machine event happened
# open document to read from
l = open('./columns_sorted.txt', 'w')
# CREATE ARRAY CALLED EVENTS
events = collections.deque()
i = 1
# FILL ARRAY WITH "FACTS" ROWS; SPLIT INTO FIELDS, CHANGE TYPES AS APPROPRIATE
for line in j: # columns_unsorted
line = line.rstrip('\n')
string_j = line.split(' ')
time = str(string_j[0])
fan_on = int(string_j[1])
fan_off = int(string_j[2])
heater_on = int(string_j[3])
heater_off = int(string_j[4])
space_on = int(string_j[5])
space_off = int(string_j[6])
pump_on = int(string_j[7])
pump_off = int(string_j[8])
event_time = str(string_j[9])
row = time, fan_on, fan_off, heater_on, heater_off, space_on, space_off, pump_on, pump_off, event_time
events.append(row)
You are missing the readlines function, no?
You have to do:
j = open('./columns_unsorted.txt', 'r')
l = j.readlines()
for line in l:
# what you want to do with each line
In the future, you should print some of your variables, just to be sure the code is working as you want it to, and to help you identifying problems.
(for example, if in your code you would print string_j you would see what kind of problem you have)
Problem was an inconsistent line in the data file. Forgive my haste in posting