Convert image to PDF with Django? - django

Receive multiple images as input from the user and convert them into PDF. I don't understand how to implement it with Django.

Use this command to install the packages
pip install img2pdf
Below is the implementation:
Image can be converted into pdf bytes using img2pdf.convert() functions provided by img2pdf module, then the pdf file opened in wb mode and is written with the bytes.
# Python3 program to convert image to pfd
# using img2pdf library
# importing necessary libraries
import img2pdf
from PIL import Image
import os
# storing image path
img_path = "C:/Users/Admin/Desktop/GfG_images/do_nawab.png"
# storing pdf path
pdf_path = "C:/Users/Admin/Desktop/GfG_images/file.pdf"
# opening image
image = Image.open(img_path)
# converting into chunks using img2pdf
pdf_bytes = img2pdf.convert(image.filename)
# opening or creating pdf file
file = open(pdf_path, "wb")
# writing pdf files with chunks
file.write(pdf_bytes)
# closing image file
image.close()
# closing pdf file
file.close()
# output
print("Successfully made pdf file")

Pillow supports PDF format. Documentation is available here.
from PIL import Image
img = Image.open('/path/to/image.jpg')
img = img.convert('RGB') //This removes alpha channel from .png images.
img.save('/path/to/image.pdf', format="PDF")

Related

Kernel restarts when compressing tif file using PIL in Anaconda

I'm trying to compress a bunch of tiff files with the pillow package. However, when I'm executing the code in python3.7.13 in Spyder IDE within the Anaconda3 environment, the Kernel restarts in the line in which the tiff file should be compressed. I tried different compression methods (e.g. "group4", "zlib", "deflate", etc..).
I also tried other packages like libtiff and tifffile, but the same problem occurs here as well.
import os
import glob
from PIL import Image, TiffTags
from IPython.display import display
import numpy as np
images = [file for file in os.listdir("C:/Home/Slicer/tif") if file.endswith('tif')]
#search for alle the images in the path
for image in images:
img_name = str(image)
img = Image.open("C:/Home/Slicer/tif/"+img_name)
print(img_name + str(img))
display(img)
#save tiff file with new name
img.save("C:/Home/Slicer/tif/" + "compressed" + img_name, compression="tiff_lzw")
Console output

Does google cloud vision api( source path- gcsSource) supports image detection (image contains text) in PDF file?

I am using OCR with TEXT_DETECTION and DOCUMENT_TEXT_DETECTION to process pdf file(InputConfig mimeType- "application/pdf"). Currently images are getting skipped while processing. Is there any possible way to process images(having text) in PDF file?
To answer your question, yes there is a way to process images with text in PDF files. According to Google official documentation, it is normally by using OCR DOCUMENT_TEXT_DETECTION [1].
The Vision API can detect and transcribe text from PDF and TIFF files stored in Cloud Storage. Document text detection from PDF and TIFF must be requested using the files:asyncBatchAnnotate function, which performs an offline (asynchronous) request and provides its status using the operations resources. The output from a PDF/TIFF request is written to a JSON file created in the specified Cloud Storage bucket.[2]
[1]https://cloud.google.com/vision/docs/ocr#optical_character_recognition_ocr
[2]https://cloud.google.com/vision/docs/pdf#vision_text_detection_pdf_gcs-gcloud
EDIT
I don't know what language you are using but I tried this python code and it processes a pdf with images without skipping them.
You need to install google-cloud-storage and google-cloud-vision.
On gcs_source_uri you have to specify your bucket name and your pdf file that you are using.
On gcs_destination_uri you only have to specify your bucket name let pdf_result as it is.
import os
import re
import json
from google.cloud import vision
from google.cloud import storage
"""
#pip install --upgrade google-cloud-storage
#pip install --upgrade google-cloud-vision
"""
credential_path = 'your_path'
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credential_path
client = vision.ImageAnnotatorClient()
batch_size = 2
mime_type = 'application/pdf'
feature = vision.Feature(
type_=vision.Feature.Type.DOCUMENT_TEXT_DETECTION)
gcs_source_uri= 'gs://your_bucketname/your_pdf_File.pdf'
gcs_source = vision.GcsSource(uri=gcs_source_uri)
input_config = vision.InputConfig(gcs_source=gcs_source, mime_type=mime_type)
gcs_destination_uri = 'gs://your_bucketname/pdf_result'
gcs_destination = vision.GcsDestination(uri=gcs_destination_uri)
output_config = vision.OutputConfig(gcs_destination=gcs_destination, batch_size= batch_size)
async_request = vision.AsyncAnnotateFileRequest(
features=[feature], input_config=input_config, output_config=output_config
)
operation = client.async_batch_annotate_files(requests=[async_request])
operation.result(timeout=180)
storage_client = storage.Client()
match = re.match(r'gs://([^/]+)/(.+)', gcs_destination_uri)
bucket_name = match.group(1)
prefix = match.group(2)
bucket = storage_client.get_bucket(bucket_name)
#List object with the given prefix
blob_list = list(bucket.list_blobs(prefix=prefix))
print('Output files: ')
for blob in blob_list:
print(blob.name)
output = blob_list[0]
json_string = output.download_as_string()
response = json.loads(json_string)
first_page_response = response['responses'][0]
annotation = first_page_response['fullTextAnnotation']
print('Full text:\n')
print(annotation['text'])

Generate thumbnail for inmemory uploaded video file

The client app uploaded a video file and i need to generate a thumbnail and dump it to AWS s3 and return the client the link to the thumbnail.
I searched around and found ffmpeg fit for the purpose.
The following was the code i could come up with:
from ffmpy import FFmpeg
import tempfile
def generate_thumbnails(file_name):
output_file = tempfile.NamedTemporaryFile(suffix='.jpg', delete=False, prefix=file_name)
output_file_path = output_file.name
try:
# generate the thumbnail using the first frame of the video
ff = FFmpeg(inputs={file_name: None}, outputs={output_file_path: ['-ss', '00:00:1', '-vframes', '1']})
ff.run()
# upload generated thumbnail to s3 logic
# return uploaded s3 path
except:
error = traceback.format_exc()
write_error_log(error)
finally:
os.remove(output_file_path)
return ''
I was using django and was greeted with permission error for the above.
I found out later than ffmpeg requires the file to be on the disk and doesn't just take into account the InMemory uploaded file (I may be wrong as i assumed this).
Is there a way to read in memory video file likes normal ones using ffmpeg or should i use StringIO and dump it onto a temp. file?
I prefer not to do the above as it is an overhead.
Any alternative solution with a better benchmark also would be appreciated.
Thanks.
Update:
To save the inmemory uploaded file to disk: How to copy InMemoryUploadedFile object to disk
One of the possible ways i got it to work were as follows:
Steps:
a) read the InMemory uploaded file onto a temp file chunk by chunk
temp_file = tempfile.NamedTemporaryFile(suffix='.mp4', delete=False)
temp_file_path = temp_file.name
with open(temp_file_path, 'wb+') as destination:
for chunk in in_memory_file_content.chunks():
destination.write(chunk)
b) generate thumbnail using ffmpeg and subprocess
ffmpeg_command = 'ffmpeg -y -i {} -ss 00:00:01 vframes 1 {}'.format(video_file_path, thumbail_file_path
subprocess.call(ffmpeg_command, shell=True)
where,
-y is to overwrite the destination if it already exists
00:00:01 is to grab the first frame
More info on ffmpeg: https://ffmpeg.org/ffmpeg.html

Python - fetching image from urllib and then reading EXIF data from PIL Image not working

I use the following code to fetch an image from a url in python :
import urllib
from PIL import Image
urllib.urlretrieve("http://www.gunnerkrigg.com//comics/00000001.jpg", "00000001.jpg")
filename = '00000001.jpg'
img = Image.open(filename)
exif = img._getexif()
However, this way the exif data is always "None". But when I download the image by hand and then read the EXIF data in python, the image data is not None.
I have also tried the following approach (from Downloading a picture via urllib and python):
import urllib
f = open('00000001.jpg','wb')
f.write(urllib.urlopen('http://www.gunnerkrigg.com//comics/00000001.jpg').read())
f.close()
filename = '00000001.jpg'
img = Image.open(filename)
exif = img._getexif()
But this gives me 'None' for 'exif' again. Could someone please point out what I may do to solve this problem?
Thank you!
The .jpg you are using contains no exif information. If you try the same python with an exif example from http://www.exif.org/samples/ , I think you will find it works.

Read a gzip file from a url with zlib in Python 2.7

I'm trying to read a gzip file from a url without saving a temporary file in Python 2.7. However, for some reason I get a truncated text file. I have spend quite some time searching the net for solutions without success. There is no truncation if I save the "raw" data back into a gzip file (see sample code below). What am I doing wrong?
My example code:
import urllib2
import zlib
from StringIO import StringIO
url = "ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/clinvar_00-latest.vcf.gz"
# Create a opener
opener = urllib2.build_opener()
request = urllib2.Request(url)
request.add_header('Accept-encoding', 'gzip')
# Fetch the gzip filer
respond = opener.open(request)
compressedData = respond.read()
respond.close()
opener.close()
# Extract data and save to text file
compressedDataBuf = StringIO(compressedData)
d = zlib.decompressobj(16+zlib.MAX_WBITS)
buffer = compressedDataBuf.read(1024)
saveFile = open('/tmp/test.txt', "wb")
while buffer:
saveFile.write(d.decompress(buffer))
buffer = compressedDataBuf.read(1024)
saveFile.close()
# Save "raw" data to new gzip file.
saveFile = open('/tmp/test.gz', "wb")
saveFile.write(compressedData)
saveFile.close()
Because that gzip file consists of many concatenated gzip streams, as permitted by RFC 1952. gzip automatically decompresses all of the gzip streams.
You need to detect the end of each gzip stream and restart the decompression with the subsequent compressed data. Look at unused_data in the Python documentation.