This question already has answers here:
Python: Fast and efficient way of writing large text file
(3 answers)
Closed 6 years ago.
I am trying to write to append to a list using cPickle in python 2.7 but it does not append.
Code:
import cPickle
import numpy
a = numpy.array([[1, 2],[3, 4]]);
output = open("1.pkl",'wb');
cPickle.dump(a,output);
a = numpy.array([[4, 5],[6, 7]]);
output = open("1.pkl",'ab');
cPickle.dump(a,output);
print(cPickle.load(open("1.pkl",'rb')));
Output:
[[1 2]
[3 4]]
I was using this method to append the arrays in text files before
Code:
a = numpy.array([[1, 2],[3, 4]]);
text_file = open("1.txt", "w");
numpy.savetxt(text_file, a);
text_file.close();
a = numpy.array([[4, 5],[6, 7]]);
text_file = open("1.txt", "a");
numpy.savetxt(text_file, a);
text_file.close();
text_file = open("1.txt", "r");
print(text_file.read());
Output:
1.000000000000000000e+00 2.000000000000000000e+00
3.000000000000000000e+00 4.000000000000000000e+00
4.000000000000000000e+00 5.000000000000000000e+00
6.000000000000000000e+00 7.000000000000000000e+00
I Was using this to write the data of a python simulation I setup for Power Systems. The output data is huge around 7GB. And the writing process was slowing down the simulation a lot. I read that cPickle can make writing process faster.
How do I append to the cPickle output file without having to read the whole data?
Or is there a better alternative to cPickle to make writing faster?
I don't believe you can just append to a pickle, or in a way that makes sense anyway.
If you just get the current serialized version of an object and add another serialized object at the end of the file, it wouldn't just magically append the second object to the original list.
You would need to read in the original object, append to it in Python, and then dump it back.
import cPickle as pickle
import numpy as np
filename = '1.pkl'
a = np.array([[1, 2],[3, 4]])
b = np.array([[4, 5],[6, 7]])
# dump `a`
with open(filename,'wb') as output_file:
pickle.dump(a, output, -1)
# load `a` and append `b` to it
with open(filename, 'rb') as output_file:
old_data = pickle.load(output_file)
new_data = np.vstack([old_data,a])
# dump `new_data`
with open(filename, 'wb') as output_file:
pickle.dump(new_data, output_file, -1)
# test
with open(filename, 'rb') as output_file:
print(pickle.load(output_file))
After reading your question a second time, you state that you don't want to read in the whole data again. I suppose this doesn't answer your question then, does it?
Related
I want to convert the string data type to numpy array of 2-D.
I'm importing a .txt file from a directory which contains:
[[18,1,2018,12,15],
[07,1,2018,12,15],
[03,1,2018,12,15]]
and the code is:
import numpy as np
f = open("/home/pi/timer_database.txt","r")
read = f.read()
x = np.array(list(read))
print(x.size)
print(type(x))
print(x.ndim)
The output is :
47
type <numpy.ndarray>
1
Please help me in this issue.
Use This code
import numpy as np
f = open("/home/pi/timer_database.txt","r")
read = f.read()
read = read.replace("[" , "")
read = read.replace("]" , "")
read = read.replace(",\n" , "\n")
f= open("New_Array.txt","w+")
f.write(read)
f.close()
Array = np.loadtxt("New_Array.txt" , delimiter=',')
print(Array)
You can use ast to evaluate your string, which is much easier than parsing the whole thing:
import ast
x=np.array(ast.literal_eval(read))
Or simply eval:
x=np.array(eval(read))
But this will raise an error because of the leading zeros you have, so first simply remove them:
import re
read=re.sub(r'\b0','',read)
Also if you are writing the file, it is much more advisable to use other approaches, first I would suggest to simply use pickle.
I have a folder where a store files from my fitting model in .txt format.
My question here is how to write a loop which will take e.g p1_cen 7.65782003 from this file and append it to a column in a .csv file?
The other thing with my question is that number of those files is equal to 288, because I store 5 minute long data from each day. And a loop what I need is to take from those 288 files a specifit data e.g like above, do You have any ideas how to do this?
For now, I have this code, which writes data in .txt files from my lmfit model.
with open('S:\Doc\Python\Results\DecompositionBx ' + "{0}".format(Station) + "{0}{1}".format(Start_time_hours_format, Start_time_minutes_format) + ".txt", 'w') as fh:
fh.write(result.fit_report(show_correl=False))
Btw. my files are named accordingly
DecompositionBxHylaty0000
...
DecompositionBxHylaty2355
UPDATE!!!
So the code from #bobrobbob works:
import csv
from datetime import timedelta
data = []
for i in range(288):
skip = i*timedelta(minutes=5)
hours, minutes, _ = str(skip).split(':')
filename = "S:\Dok\Python\Results\DecompositionBx Hylaty%02d%02d.txt" % (int(hours), int(minutes))
with open(filename) as f:
lines = f.readlines()
for line in lines:
if line.startswith(' p1_cen'):
data.append(line.split('+')[0])
break
with open('S:\Dok\Python\Results\data.csv', 'w') as f:
writer = csv.writer(f)
for line in data:
writer.writerow(line)
I get something like this, which is nearly perfect:
a bit ugly on the time handling, maybe someone will come with a cleaner solution. but it should work nonetheless
import csv
from datetime import timedelta
data = []
for i in range(288):
skip = i*timedelta(minutes=5)
hours, minutes, _ = str(skip).split(':')
filename = "DecompositionBxHylaty%02d%02d" % (int(hours), int(minutes))
with open(filename) as f:
lines = f.readlines()
for line in lines:
if line.startswith('p1_cen'):
data.append(line.split('+')[0].strip())
break
with open('data.csv', 'w', newline='') as f:
writer = csv.writer(f, delimiter=' ')
for line in data:
writer.writerow(line.split())
I have two parameters like filename and time and I want to write them in a column in a csv file. These two parameters are in a for-loop so their value is changed in each iteration.
My current python code is the one below but the resulting csv is not what I want:
import csv
import os
with open("txt/scalable_decoding_time.csv", "wb") as csv_file:
writer = csv.writer(csv_file, delimiter=',')
filename = ["one","two", "three"]
time = ["1","2", "3"]
zipped_lists = zip(filename,time)
for row in zipped_lists:
print row
writer.writerow(row)
My csv file must be like below. The , must be the delimeter. So I must get two columns.
one, 1
two, 2
three, 3
My csv file now reads as the following picture. The data are stored in one column.
Do you know how to fix this?
Well, the issue here is, you are using writerows instead of writerow
import csv
import os
with open("scalable_decoding_time.csv", "wb") as csv_file:
writer = csv.writer(csv_file, delimiter=',')
level_counter = 0
max_levels = 3
filename = ["one","two", "three"]
time = ["1","2", "3"]
while level_counter < max_levels:
writer.writerow((filename[level_counter], time[level_counter]))
level_counter = level_counter +1
This gave me the result:
one,1
two,2
three,3
Output:
This is another solution
Put the following code into a python script that we will call sc-123.py
filename = ["one","two", "three"]
time = ["1","2", "3"]
for a,b in zip(filename,time):
print('{}{}{}'.format(a,',',b))
Once the script is ready, run it like that
python2 sc-123.py > scalable_decoding_time.csv
You will have the results formatted the way you want
one,1
two,2
three,3
import csv
reader = csv.reader(post.text, quotechar="'")
with open('source91.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(list(reader))
output is showing vertically i need to print the data horizantally in CSV
Simple Answer : if you have only one array
with open('source91.csv', 'wb') as f:
writer = csv.writer(f, delimiter='\n')
writer.writerows(list(reader))
Complicated answer:
you may need numpy to make is happen.
transpose will simply converts row to column
import numpy as np
a = np.array(list(reader))
a = np.append(a, list(reader)) # if you have multiple lines
a = np.transpose(a)
np.savetxt('source91.csv', a)
When running this simple script, the "output_file.csv" remains open. I am unsure about how to close the file in this scenario.
I have looked at other examples where the open() function is assigned to a variable such as 'f', and the object closed using f.close(). Because of the with / as csv-file, I am unclear as to where the file object actually is. Would anyone mind conceptually explaining where the disconnect is here? Ideally, would like to know:
how to check namespace for all open file objects
how to determine the proper method for closing these objects
simple script to read columns of data where mapping in column 1 is blank, fill down
import csv
output_file = csv.writer(open('output_file.csv', 'w'))
csv.register_dialect('mydialect', doublequote=False, quotechar="'")
def csv_writer(data):
with open('output_file.csv',"ab") as csv_file:
writer = csv.writer(csv_file, delimiter=',', lineterminator='\r\n', dialect='mydialect')
writer.writerow(data)
D = [[]]
for line in open('inventory_skus.csv'):
clean_line = line.strip()
data_points = clean_line.split(',')
print data_points
D.append([line.strip().split(',')[0], line.strip().split(',')[1]])
D2 = D
for i in range(1, len(D)):
nr = D[i]
if D[i][0] == '':
D2[i][0] = D[i-1][0]
else:
D2[i] = D[i]
for line in range(1, len(D2)):
csv_writer(D2[line])
print D2[line]
Actually, you are creating two file objects (in two different ways). First one:
output_file = csv.writer(open('output_file.csv', 'w'))
This is hidden within a csv.writer and not exposed by the same, however
you don't use that output writer at all, including not closing it. So it remains open until garbage collected.
In
with open('output_file.csv',"ab") as csv_file:
you get the file object in csv_file. The context block takes care of closing the object, so no need to close it manually (file objects are context managers).
Manually indexing over D2 is unnecessary. Also, why are you opening the CSV file in binary mode?
def write_data_row(csv_writer, data):
writer.writerow(data)
with open('output_file.csv',"w") as csv_file:
writer = csv.writer(csv_file, delimiter=',', lineterminator='\r\n', dialect='mydialect')
for line in D2[1:]:
write_data_row(writer, line)
print line