PYPDF watermarking returns error - python-2.7

hi im trying to watermark a pdf fileusing pypdf2 though i get this error i cant figure out what goes wrong.
i get the following error:
Traceback (most recent call last): File "test.py", line 13, in <module>
page.mergePage(watermark.getPage(0)) File "C:\Python27\site-packages\PyPDF2\pdf.py", line 1594, in mergePage
self._mergePage(page2) File "C:\Python27\site-packages\PyPDF2\pdf.py", line 1651, in _mergePage
page2Content, rename, self.pdf) File "C:Python27\site-packages\PyPDF2\pdf.py", line 1547, in
_contentStreamRename
op = operands[i] KeyError: 0
using python 2.7.6 with pypdf2 1.19 on windows 32bit.
hopefully someone can tell me what i do wrong.
my python file:
from PyPDF2 import PdfFileWriter, PdfFileReader
output = PdfFileWriter()
input = PdfFileReader(open("test.pdf", "rb"))
watermark = PdfFileReader(open("watermark.pdf", "rb"))
# print how many pages input1 has:
print("test.pdf has %d pages." % input.getNumPages())
print("watermark.pdf has %d pages." % watermark.getNumPages())
# add page 0 from input, but first add a watermark from another PDF:
page = input.getPage(0)
page.mergePage(watermark.getPage(0))
output.addPage(page)
# finally, write "output" to document-output.pdf
outputStream = file("outputs.pdf", "wb")
output.write(outputStream)
outputStream.close()

Try writing to a StringIO object instead of a disk file. So, replace this:
outputStream = file("outputs.pdf", "wb")
output.write(outputStream)
outputStream.close()
with this:
outputStream = StringIO.StringIO()
output.write(outputStream) #write merged output to the StringIO object
outputStream.close()
If above code works, then you might be having file writing permission issues. For reference, look at the PyPDF working example in my article.

I encountered this error when attempting to use PyPDF2 to merge in a page which had been generated by reportlab, which used an inline image canvas.drawInlineImage(...), which stores the image in the object stream of the PDF. Other PDFs that use a similar technique for images might be affected in the same way -- effectively, the content stream of the PDF has a data object thrown into it where PyPDF2 doesn't expect it.
If you're able to, a solution can be to re-generate the source pdf, but to not use inline content-stream-stored images -- e.g. generate with canvas.drawImage(...) in reportlab.
Here's an issue about this on PyPDF2.

Related

Google Vision API 'TypeError: invalid file'

The following piece of code comes from Google's Vision API Documentation, the only modification I've made is adding the argument parser for the function at the bottom.
import argparse
import os
from google.cloud import vision
import io
def detect_text(path):
"""Detects text in the file."""
client = vision.ImageAnnotatorClient()
with io.open(path, 'rb') as image_file:
content = image_file.read()
image = vision.types.Image(content=content)
response = client.text_detection(image=image)
texts = response.text_annotations
print('Texts:')
for text in texts:
print('\n"{}"'.format(text.description))
vertices = (['({},{})'.format(vertex.x, vertex.y)
for vertex in text.bounding_poly.vertices])
print('bounds: {}'.format(','.join(vertices)))
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", type=str,
help="path to input image")
args = vars(ap.parse_args())
detect_text(args)
If I run it from a terminal like below, I get this invalid file error:
PS C:\VisionTest> python visionTest.py --image C:\VisionTest\test.png
Traceback (most recent call last):
File "visionTest.py", line 31, in <module>
detect_text(args)
File "visionTest.py", line 10, in detect_text
with io.open(path, 'rb') as image_file:
TypeError: invalid file: {'image': 'C:\\VisionTest\\test.png'}
I've tried with various images and image types as well as running the code from different locations with no success.
Seems like either the file doesn't exist or is corrupt since it isn't even read. Can you try another image and validate it is in the location you expect?

Saving files in Python using "with" method

I am wanting to create a file and save it to json format. Every example I find specifies the 'open' method. I am using Python 2.7 on Windows. Please help me understand why the 'open' is necessary for a file I am saving for the first time.
I have read every tutorial I could find and researched this issue but with no luck still. I do not want to create the file outside of my program and then have my program overwrite it.
Here is my code:
def savefile():
filename = filedialog.asksaveasfilename(initialdir =
"./Documents/WorkingDirectory/",title = "Save file",filetypes = (("JSON
files","*.json"), ("All files", "*.")))
with open(filename, 'r+') as currentfile:
data = currentfile.read()
print (data)
Here is this error I get:
Exception in Tkinter callback Traceback (most recent call last):
File "C:\Python27\lib\lib-tk\Tkinter.py", line 1542, in call
return self.func(*args) File "C:\Users\CurrentUser\Desktop\newproject.py", line 174, in savefile
with open(filename, 'r+') as currentfile: IOError: [Errno 2] No such file or directory:
u'C:/Users/CurrentUser/Documents/WorkingDirectory/test.json'
Ok, I figured it out! The problem was the mode "r+". Since I am creating the file, there is no need for read and write, just write. So I changed the mode to 'w' and that fixed it. I also added the '.json' so it would be automatically added after the filename.
def savefile():
filename = filedialog.asksaveasfilename(initialdir =
"./Documents/WorkingDirectory/",title = "Save file",filetypes = (("JSON
files","*.json"), ("All files", "*.")))
with open(filename + ".json", 'w') as currentfile:
line1 = currentfile.write(stringone)
line2 = currentfile.write(stringtwo)
print (line1,line2)

scheduler produces empty files

I'm using pythonanywhere for a simple scheduled task.
I want to download data from a link once a day and save csv files. Later once i have a decent time series I'll figure out how I actually want to manage the data. It's not much data so don't need anything fancy like a database.
My script takes the data from the google sheets link, adds a log column and a time column, then writes a csv with the date in the filename.
It works exactly as I want it to when I run it manually in pythonanywhere, but the scheduler is just creating empty csv files albeit with the correct name.
Any ideas what's up? I don't understand the log file. Surely the error should happen when it is run manually?
script:
import pandas as pd
import time
import datetime
def write_today(df):
date = time.strftime("%Y-%m-%d")
df.to_csv('Properties_'+date+'.csv')
url = 'https://docs.google.com/spreadsheets/d/19h2GmLN-2CLgk79gVxcazxtKqS6rwW36YA-qvuzEpG4/export?format=xlsx'
df = pd.read_excel(url, header=1).rename(columns={'Unnamed: 1':'code'})
source = pd.read_excel(url).columns[0]
df['source'] = source
df['time'] = datetime.datetime.now()
write_today(df)
the scheduler is set up as so:
log file:
Traceback (most recent call last):
File "/home/abmoore/load_data.py", line 24, in <module>
write_today(df)
File "/home/abmoore/load_data.py", line 16, in write_today
df.to_csv('Properties_'+date+'.csv')
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1344, in to_csv
formatter.save()
File "/usr/local/lib/python2.7/dist-packages/pandas/formats/format.py", line 1551, in save
self._save()
File "/usr/local/lib/python2.7/dist-packages/pandas/formats/format.py", line 1638, in _save
self._save_header()
File "/usr/local/lib/python2.7/dist-packages/pandas/formats/format.py", line 1634, in _save_header
writer.writerow(encoded_labels)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 0: ordinal not in range(128)
Your problem there is the UnicodeDecodeError -- you have some non-ascii data in your spreadsheet, and the pandas to_csv function defaults to ascii encoding. try specifying utf8 instead:
def write_today(df):
filename = 'Properties_{date}.csv'.format(date=time.strftime("%Y-%m-%d"))
df.to_csv(filename, encoding='utf8')
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html

Using scraperwiki for pdf-file on disk

I am trying to get some data out of a pdf document using scraperwiki for pyhon. It works beautifully if I download the file using urllib2 like so:
pdfdata = urllib2.urlopen(url).read()
xmldata = scraperwiki.pdftoxml(pdfdata)
root = lxml.html.fromstring(xmldata)
pages = list(root)
But here comes the tricky part. As I would like to do this for a large number of pdf-files that I have on my disk, I would like to do away with the first line and pass the pdf file directly as an argument. However, if I try
pdfdata = open("filename.pdf","wb")
xmldata = scraperwiki.pdftoxml(pdfdata)
root = lxml.html.fromstring(xmldata)
I get the following error
xmldata = scraperwiki.pdftoxml(pdfdata)
File "/usr/local/lib/python2.7/dist-packages/scraperwiki/utils.py", line 44, in pdftoxml
pdffout.write(pdfdata)
TypeError: must be string or buffer, not file
I am guessing that this occurs because I do not open the pdf correctly?
If so, is there a way to open a pdf from disk just like urllib2.urlopen() does?
urllib2.urlopen(...).read() does just that it reads the contents of the stream returned from the url you passed as a parameter.
While open() returns a file handler. Just as urllib2 needed to do an open() call then a read() call so does file handlers.
Change your program to use the the following lines:
with open("filename.pdf", "rb") as pdffile:
pdfdata=pdffile.read()
xmldata = scraperwiki.pdftoxml(pdfdata)
root = lxml.html.fromstring(xmldata)
This will open your pdf then read the contents into a buffer named pdfdata. From there your call to scraperwiki.pdftoxml() will work as expected.

How to read & export certain files from a Python GUI-prompted directory?

OK guys,
I'm currently working on a file reading and processing with Python & OpenCV cs' GUI feature. The feature will prompt the user to select a directory path for a folder containing 340 JPEG images, which I labelled them as "frame1" to "frame340". Then, I want to select several frames, process them, and save the processed ones in a different directory.
My big issue is, I'm trying to get only frame87, frame164, and frame248 from this folder with 340 images, and Python just keep returning error that claimed "directory name is invalid", like this:
Traceback (most recent call last):
File "C:\Users\maxwell_hamzah\Documents\Python27\imageReadBeta.py", line 25, in <module>
imgRead = os.listdir(str(dirname) + "/frame"+ str(i) + ".jpg")
WindowsError: [Error 267] The directory name is invalid: 'C:/Users/maxwell_hamzah/Documents/Python27/toby arm framed/frame87.jpg/*.*'
To help familiarize with the situation, here's what my work looks like:
import os
import numpy as np
import cv2
from matplotlib import pyplot as plt
from skimage import color, data, restoration
import Tkinter, tkFileDialog
# first, we setup the Tkinter features for file-reading
root = Tkinter.Tk()
root.withdraw()
# prompt user to ask about the file directory
dirname = tkFileDialog.askdirectory\
(parent=root,initialdir="/",title='Pick FRAMES directory')
X = [] # initiate an array to store read images
frameIndex = [87, 163, 248] #this index is which frames we are interested in
imgRead = ""
temp = []
# we begin to read only frame87, frame163, and frame248
for i in frameIndex:
imgRead = os.listdir(str(dirname) + "/frame"+ str(i) + ".jpg")
temp = cv2.imread(imgRead, -1)
X.append(temp)
I'm totally stuck on how to fix this bug on especially the for loop part, where the error comes from. Python keeps freeking out on the imgRead variable claiming that the directory is invalid. Plus, I'm also wondering on how to "export" processed files to other directories (e.g. saving processed images from "My Pictures" to "My Music")
Really appreciate your help, guys.
Maxwell
In the last block, you call a method to list files, which is expecting a directory, but you pass it a file path. That's a bug, and actually you don't need that here in the first place:
for i in frameIndex:
imgRead = "{0}/frame{1}.jpg".format(dirname, i)
temp = cv2.imread(imgRead, -1)
X.append(temp)
As to moving files in Python, that's a pretty classic need, there's plenty of doc out there. One example.