Python read filenames and save in list - python-2.7

I need to read all files in a directory and save into list, then read list those files one by one.
I don't want to use external module like 'glob module'. So, trying through 2 different approach:
First approach:
import os
file_list = os.listdir("jsons")
for files in file_list:
data = open(files,"r")
output:
['A03DUrQz1BM9SQ2.json', 'A04D5V1u1BMxaV6.json', 'A0kxiHL81AN9pH5.json', 'A1Fxs5Ag1A8vuB5.json', 'A2Dsv7RE1BDqYt5.json', 'A2HkZPkn1BpvvG5.json']
but here issue is that filenames are saved in string format and not able to open this file as it read it with quotes ''.
2nd approach:
file_list = os.system("ls jsons/")
**print file_list.split()**
for files in file_list:
data = open(files,"r")
print data
output:
Traceback (most recent call last):
File "asn-1_q3.py", line 9, in <module>
print file_list.split()
AttributeError: 'int' object has no attribute 'split'
Here, it is saving as int and not able to split the file.
How should I solve them ?

You need to read your file object and os.path.join the file name with the original directory name (or it will look for the files in the current directory):
import os
import os.path
file_list = os.listdir("jsons")
for file_name in file_list:
with open(os.path.join("jsons", file_name), "r") as src_file:
data = src_file.read()
print(data)
Here's an example that uses generators to limit the amount of data in memory (vs loading all the data into an array):
import os
os.path
def all_file_content(directory_name):
file_list = os.listdir(directory_name)
for file_name in file_list:
with open(os.path.join(directory_name, file_name), "r") as src_file:
yield src_file.read()
for file_content in all_file_content("jsons"):
print(file_content)

Related

scheduler produces empty files

I'm using pythonanywhere for a simple scheduled task.
I want to download data from a link once a day and save csv files. Later once i have a decent time series I'll figure out how I actually want to manage the data. It's not much data so don't need anything fancy like a database.
My script takes the data from the google sheets link, adds a log column and a time column, then writes a csv with the date in the filename.
It works exactly as I want it to when I run it manually in pythonanywhere, but the scheduler is just creating empty csv files albeit with the correct name.
Any ideas what's up? I don't understand the log file. Surely the error should happen when it is run manually?
script:
import pandas as pd
import time
import datetime
def write_today(df):
date = time.strftime("%Y-%m-%d")
df.to_csv('Properties_'+date+'.csv')
url = 'https://docs.google.com/spreadsheets/d/19h2GmLN-2CLgk79gVxcazxtKqS6rwW36YA-qvuzEpG4/export?format=xlsx'
df = pd.read_excel(url, header=1).rename(columns={'Unnamed: 1':'code'})
source = pd.read_excel(url).columns[0]
df['source'] = source
df['time'] = datetime.datetime.now()
write_today(df)
the scheduler is set up as so:
log file:
Traceback (most recent call last):
File "/home/abmoore/load_data.py", line 24, in <module>
write_today(df)
File "/home/abmoore/load_data.py", line 16, in write_today
df.to_csv('Properties_'+date+'.csv')
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1344, in to_csv
formatter.save()
File "/usr/local/lib/python2.7/dist-packages/pandas/formats/format.py", line 1551, in save
self._save()
File "/usr/local/lib/python2.7/dist-packages/pandas/formats/format.py", line 1638, in _save
self._save_header()
File "/usr/local/lib/python2.7/dist-packages/pandas/formats/format.py", line 1634, in _save_header
writer.writerow(encoded_labels)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 0: ordinal not in range(128)
Your problem there is the UnicodeDecodeError -- you have some non-ascii data in your spreadsheet, and the pandas to_csv function defaults to ascii encoding. try specifying utf8 instead:
def write_today(df):
filename = 'Properties_{date}.csv'.format(date=time.strftime("%Y-%m-%d"))
df.to_csv(filename, encoding='utf8')
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html

Can't import python library 'zipfile'

Feel like a dunce. I'm trying to interact with a zip file and can't seem to use the zipfile library. Fairly new to python
from zipfile import *
#set filename
fpath = '{}_{}_{}.zip'.format(strDate, day, week)
#use zipfile to get info about ftp file
zip = zipfile.Zipfile(fpath, mode='r')
# doesn't matter if I use
#zip = zipfile.Zipfile(fpath, mode='w')
#or zip = zipfile.Zipfile(fpath, 'wb')
I'm getting this error
zip = zipfile.Zipfile(fpath, mode='r')
NameError: name 'zipfile' is not defined
if I just use import zipfile I get this error:
TypeError: 'module' object is not callable
Two ways to fix it:
1) use from, and in that case drop the zipfile namespace:
from zipfile import *
#set filename
fpath = '{}_{}_{}.zip'.format(strDate, day, week)
#use zipfile to get info about ftp file
zip = ZipFile(fpath, mode='r')
2) use direct import, and in that case use full path like you did:
import zipfile
#set filename
fpath = '{}_{}_{}.zip'.format(strDate, day, week)
#use zipfile to get info about ftp file
zip = zipfile.ZipFile(fpath, mode='r')
and there's a sneaky typo in your code: Zipfile should be ZipFile (capital F, so I feel slightly bad for answering...
So the lesson learnt is:
avoid from x import y because editors have a harder time to complete words
with a proper import zipfile and an editor which proposes completion, you would never have had this problem in the first place.
Easiest way to zip a file using Python:
import zipfile
zf = zipfile.ZipFile("targetZipFileName.zip",'w', compression=zipfile.ZIP_DEFLATED)
zf.write("FileTobeZipped.txt")
zf.close()

How to open concurrently two files with same name and different extension in python?

I have a folder with multiple couple of files:
a.txt
a.json
b.txt
b.json
and so on:
Using a for loop i want to open a couple of file (a.txt and a.json) concurrently.
Is there a way to do it using the 'with' statement in python?
You could do something like the following which constructs a dictionary keyed by the file name sans extension, and with a count of the number of files matching the required extensions. Then you can iterate over the dictionary opening pairs of files:
import os
from collections import defaultdict
EXTENSIONS = {'.json', '.txt'}
directory = '/path/to/your/files'
grouped_files = defaultdict(int)
for f in os.listdir(directory):
name, ext = os.path.splitext(os.path.join(directory, f))
if ext in EXTENSIONS:
grouped_files[name] += 1
for name in grouped_files:
if grouped_files[name] == len(EXTENSIONS):
with open('{}.txt'.format(name)) as txt_file, \
open('{}.json'.format(name)) as json_file:
# process files
print(txt_file, json_file)
i have two folders of diffrent files one with .jpg and another with.xml this is how i put them into another folder
import os
from pathlib import Path
import shutil
#making the list to store the name
picList=list()
xmlList=list()
#making the directory path
xmlDir = os.listdir('C:\\Users\\%USERNAME%\\Desktop\\img+xml\\XML')
picDir=os.listdir('C:\\Users\\%USERNAME%\\Desktop\\img+xml\\img')
dest=r'C:\Users\%USERNAME%\Desktop\img+xml\i'
#appending the file name to the list
for a in xmlDir:
a=Path(a).stem
xmlList.append(a)
picList.append(a)
#matching and putting file name in destination
for a in xmlList:
for b in picList:
if a==b:
try:
shutil.move(f'C:\\Users\\%USERNAME%\\Desktop\\img+xml\\XML\\{a}.xml',dest)
shutil.move(f'C:\\Users\\%USERNAME%\\Desktop\\img+xml\\img\\{b}.jpg',dest)
except Exception as e:
print(e)

How do I confirm with python that required files are in a particular folder and are accessible or not?

I have 5 files in a folder App:
App|
|--A.txt
|--B.txt
|--C.txt
|--D.txt
|--E.txt
|--Run.py
|--Other Folders or Files
Now I want to know if files (A.txt,B.txtC.txt,C.txt,D.txt,E.txt) is present or not and if its there than I want to call a function Cleaner which will supply names of these files to that function. I have written this code but nothing is happening.The function is not getting called.
import glob
import csv
import itertools
files = glob.glob("*.txt")
i = 0
def sublist(a, b):
seq = iter(b)
try:
for x in a:
while next(seq) != x: pass
else:
return True
except StopIteration:
pass
return False
required_files = ['Alternate_ADR6_LFB1.txt', 'Company_Code.txt', 'Left_LIFNR.txt', 'LFA1.txt', 'LFB1.TXT', 'LFBK.TXT']
if sublist(required_files,files):
for files in required_files:
try:
f = open(files , 'r')
f.close()
except IOError as e:
print 'Error opening or accessing files'
i = 1
else:
print 'Required files are not in correct folder'
if i == 1:
for files in required_files:
Cleansing(files)
def Cleansing(filename):
with open('filename', 'rb') as f_input:
...
...
break
with open('filename', 'rb') as f_input, open('filename_Cleaned.csv', 'wb') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow('something')
Upadate
I think now I am able to call the function and also able to check the valid files but its not that pythonic. And I am not able to open or create a file with the name of the file plus _cleaned :filename_cleaned.csv.
You want to check if a list of files (required_files) are in a folder.
You successfully get the complete list of text files in the folder with files = glob.glob("*.txt")
So the first question is: Checking for sublist in list
As the order is not important, we can use sets:
if set(required_files) <= set(files):
# do stuff
else:
#print warning
Next question: How to open the files and create an outputs with names like "filename_Cleaned.csv"
A very important thing you have to understand: "filename" is not the same thing as filename. The first is a string, it will always be the same thing, it will not be replaced by real filenames. When writing open('filename', 'rb') you're trying to open a file called "filename".
filename however can be a variable name and take different values.
for filename in required_files:
Cleansing(filename)
def Cleansing(filename):
with open(filename, 'rb') as f_input, open(filename+'_Cleaned.csv', 'wb') as f_output:
#read stuff in f_input
#write stuff in f_output

PYPDF watermarking returns error

hi im trying to watermark a pdf fileusing pypdf2 though i get this error i cant figure out what goes wrong.
i get the following error:
Traceback (most recent call last): File "test.py", line 13, in <module>
page.mergePage(watermark.getPage(0)) File "C:\Python27\site-packages\PyPDF2\pdf.py", line 1594, in mergePage
self._mergePage(page2) File "C:\Python27\site-packages\PyPDF2\pdf.py", line 1651, in _mergePage
page2Content, rename, self.pdf) File "C:Python27\site-packages\PyPDF2\pdf.py", line 1547, in
_contentStreamRename
op = operands[i] KeyError: 0
using python 2.7.6 with pypdf2 1.19 on windows 32bit.
hopefully someone can tell me what i do wrong.
my python file:
from PyPDF2 import PdfFileWriter, PdfFileReader
output = PdfFileWriter()
input = PdfFileReader(open("test.pdf", "rb"))
watermark = PdfFileReader(open("watermark.pdf", "rb"))
# print how many pages input1 has:
print("test.pdf has %d pages." % input.getNumPages())
print("watermark.pdf has %d pages." % watermark.getNumPages())
# add page 0 from input, but first add a watermark from another PDF:
page = input.getPage(0)
page.mergePage(watermark.getPage(0))
output.addPage(page)
# finally, write "output" to document-output.pdf
outputStream = file("outputs.pdf", "wb")
output.write(outputStream)
outputStream.close()
Try writing to a StringIO object instead of a disk file. So, replace this:
outputStream = file("outputs.pdf", "wb")
output.write(outputStream)
outputStream.close()
with this:
outputStream = StringIO.StringIO()
output.write(outputStream) #write merged output to the StringIO object
outputStream.close()
If above code works, then you might be having file writing permission issues. For reference, look at the PyPDF working example in my article.
I encountered this error when attempting to use PyPDF2 to merge in a page which had been generated by reportlab, which used an inline image canvas.drawInlineImage(...), which stores the image in the object stream of the PDF. Other PDFs that use a similar technique for images might be affected in the same way -- effectively, the content stream of the PDF has a data object thrown into it where PyPDF2 doesn't expect it.
If you're able to, a solution can be to re-generate the source pdf, but to not use inline content-stream-stored images -- e.g. generate with canvas.drawImage(...) in reportlab.
Here's an issue about this on PyPDF2.