How to put an image resized with Pillow into GridFS? - python-2.7

Scenario
I use a formData form to upload an image via ajax which is then added to a MongoDB GridFS database.
This was working:
my_image = request.files.my_image
raw = my_image.file.read()
fs.put(raw)
Desired Behaviour
I want to resize the image with Pillow before adding to GridFS.
What I Tried
I changed the above to:
my_image = request.files.my_image
raw = Image.open(my_image.file.read())
raw_resized = raw.resize((new_dimensions))
fs.put(raw_resized)
Actual Behaviour
I am now getting 500 errors. Tail shows:
TypeError: file() argument 1 must be encoded string without NULL bytes, not str
Question
How do I properly handle the Pillow image object so that I can add it to GridFS?
Troubelshooting
This is still unresolved, but I'm just adding my attempts to understand what is happening with file types etc at different stages of the process by using the interpretor:
>>> my_image = open("my_great_image.jpg")
>>> my_image
<open file 'my_great_image.jpg', mode 'r' at 0x0259CF40>
>>> type(my_image)
<type 'file'>
>>> my_image_read = my_image.read()
>>> my_image_read
# lots of image data
>>> type(my_image_read)
<type 'str'>
>>> my_pil_image = Image.open("my_great_image.jpg")
>>> my_pil_image
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=400x267 at 0x2760CD8>
>>> type(my_pil_image)
<type 'instance'>
So from this I think I can deduce that, originally, GridFS was accepting of the string version of the image generated from the read() method.
So I think I need to somehow make the Pillow image object a string in order to get it into GridFS.

Solution
Wo, check this out, it works, the logic is:
Form uploads image,
Pillow does three different resizes,
Use StringIO to convert Pillow objects to strings
Put resized images, as strings, in GridFS.
Python
from PIL import Image
import StringIO
# define three new image sizes
card_dimensions = (108,108)
main_dimensions = (42,38)
thumb_dimensions = (18,14)
# covert uploaded image to Pillow object
raw_pil = Image.open(my_image.file)
# conversion to card size
raw_card_output = StringIO.StringIO()
raw_card = raw_pil.resize((card_dimensions))
raw_card.save(raw_card_output,format=raw_pil.format)
raw_card_output_contents = raw_card_output.getvalue()
# conversion to main size
raw_main_output = StringIO.StringIO()
raw_main = raw_pil.resize((main_dimensions))
raw_main.save(raw_main_output,format=raw_pil.format)
raw_main_output_contents = raw_main_output.getvalue()
#conversion to thumb size
raw_thumb_output = StringIO.StringIO()
raw_thumb = raw_pil.resize((thumb_dimensions))
raw_thumb.save(raw_thumb_output,format=raw_pil.format)
raw_thumb_output_contents = raw_thumb_output.getvalue()
# put card image into GridFS
fs.put(raw_card_output_contents)
# put main image into GridFS
fs.put(raw_main_output_contents)
# put thumb image into GridFS
fs.put(raw_thumb_output_contents)
Further Explanation
Basically I deduced that GridFS was accepting a string, so therefore I needed to transform the Pillow object into a string.
The interpreter troubleshooting below should make some of the dynamics clearer and carries on from the troubleshooting in the original post:
>>> my_pil_image
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=400x267 at 0x2760CD8>
>>> my_pil_image_resized = my_pil_image.resize((50,50))
>>> my_pil_image_resized
<PIL.Image.Image image mode=RGB size=50x50 at 0x2898710>
>>> output = StringIO.StringIO()
>>> my_pil_image_resized.save(output,format="JPEG")
>>> contents = output.getvalue()
>>> type(contents)
<type 'str'>
>>> contents
# lots of image data
So basically the above process show the mechanics of how to convert the Pillow object into a string which can then be added to GridFS.

I just got a simple way to do this with flask.
I am using FileStorage for file holder before uploading to gridFS.
It also help me to wrap image from pillow after resize.
Document looks like this.
from mongoengine import Document ImageField
class SomeDocument(Document):
icon = ImageField(required=False, collection_name="collection_name")
And the controller is
from io import BytesIO
from werkzeug.datastructures import FileStorage
from PIL import Image
im = Image.open(request.files.get('icon'))
im.thumbnail((400, 400))
output = BytesIO()
im.save(output, format=im.format, quality=90)
original_extension = request.files.get('icon').filename.rsplit('.', 1)[1].lower()
SomeDocument.icon.put(FileStorage(output, content_type=f"image/{original_extension"))
SomeDocument.save()
im.close()

Related

Reading multiple files in a directory with pyyaml

I'm trying to read all yaml files in a directory, but I am having trouble. First, because I am using Python 2.7 (and I cannot change to 3) and all of my files are utf-8 (and I also need them to keep this way).
import os
import yaml
import codecs
def yaml_reader(filepath):
with codecs.open(filepath, "r", encoding='utf-8') as file_descriptor:
data = yaml.load_all(file_descriptor)
return data
def yaml_dump(filepath, data):
with open(filepath, 'w') as file_descriptor:
yaml.dump(data, file_descriptor)
if __name__ == "__main__":
filepath = os.listdir(os.getcwd())
data = yaml_reader(filepath)
print data
When I run this code, python gives me the message:
TypeError: coercing to Unicode: need string or buffer, list found.
I want this program to show the content of the files. Can anyone help me?
I guess the issue is with filepath.
os.listdir(os.getcwd()) returns the list of all the files in the directory. so you are passing the list to codecs.open() instead of filename
There are multiple problems with your code, apart from that it is invalide Python, in the way you formatted this.
def yaml_reader(filepath):
with codecs.open(filepath, "r", encoding='utf-8') as file_descriptor:
data = yaml.load_all(file_descriptor)
return data
however it is not necessary to do the decoding, PyYAML is perfectly capable of processing UTF-8:
def yaml_reader(filepath):
with open(filepath, "rb") as file_descriptor:
data = yaml.load_all(file_descriptor)
return data
I hope you realise your trying to load multiple documents and always get a list as a result in data even if your file contains one document.
Then the line:
filepath = os.listdir(os.getcwd())
gives you a list of files, so you need to do:
filepath = os.listdir(os.getcwd())[0]
or decide in some other way, which of the files you want to open. If you want to combine all files (assuming they are YAML) in one big YAML file, you need to do:
if __name__ == "__main__":
data = []
for filepath in os.listdir(os.getcwd()):
data.extend(yaml_reader(filepath))
print data
And your dump routine would need to change to:
def yaml_dump(filepath, data):
with open(filepath, 'wb') as file_descriptor:
yaml.dump(data, file_descriptor, allow_unicode=True, encoding='utf-8')
However this all brings you to the biggest problem: that you are using PyYAML, that will mangle your YAML, dropping flow-style, comment, anchor names, special int/float, quotes around scalars etc. Apart from that PyYAML has not been updated to support YAML 1.2 documents (which has been the standard since 2009). I recommend you switch to using ruamel.yaml (disclaimer: I am the author of that package), which supports YAML 1.2 and leaves comments etc in place.
And even if you are bound to use Python 2, you should use the Python 3 like syntax e.g. for print that you can get with from __future__ imports.
So I recommend you do:
pip install pathlib2 ruamel.yaml
and then use:
from __future__ import absolute_import, unicode_literals, print_function
from pathlib import Path
from ruamel.yaml import YAML
if __name__ == "__main__":
data = []
yaml = YAML()
yaml.preserve_quotes = True
for filepath in Path('.').glob('*.yaml'):
data.extend(yaml.load_all(filepath))
print(data)
yaml.dump(data, Path('your_output.yaml'))

(imageio or celery) Error closing: 'Image' object has no attribute 'fp'

I am using imageio to write png images to file.
import numpy as np
import matplotlib.cm as cm
import imageio # for saving the image
import matplotlib as mpl
hm_colors = ['blue', 'white','red']
cmap = mpl.colors.LinearSegmentedColormap.from_list('bwr', hm_colors)
data = np.array([[1,2,3],[5,6,7]])
norm = mpl.colors.Normalize(vmin=-3, vmax=3)
colormap = cm.ScalarMappable(norm=norm, cmap=cmap)
im = colormap.to_rgba(data)
# scale the data to a width of w pixels
im = np.repeat(im, w, axis=1)
im = np.repeat(im, h, axis=0)
# save the picture
imageio.imwrite("my_img.png", im)
This process is performed automatically and I noticed some Error messages saying:
Error closing: 'Image' object has no attribute 'fp'.
Before this message I get warning:
/usr/local/lib/python2.7/dist-packages/imageio/core/util.py:78: UserWarning: Lossy conversion from float64 to uint8, range [0, 1] dtype_str, out_type.__name__))
However, the images seem to be generated and saved just fine.
I can't find data to recreate this message.
Any idea why I get this error and why it doesn't noticeably affect the results? I don't use PIL.
One possible reason could come from using this in Celery.
Thanks!
L.
I encountered the same issue using imageio.imwrite in Python 3.5. It's a fairly harmless except for the fact that that it's stopping garbage collection and leading to excessive memory usage when writing thousands of images. The solution was to use the PIL module, which is a dependency of imageio. The last line of your code should read:
from PIL import Image
image = Image.fromarray(im)
image.save('my_img.png')

Python: Converting PIL.Image to a object that has object.read()

I have a variable of type PIL.Image.
I need to pass it to a function(that I didn't write) that will save it to disk for me. Let's call that function file_creator(my_PIL_image)
In the implementation of file_creator() a .read() method is used on my_PIL_image variable.
I need to convert my_PIL_image into a type that the file_creator() can read.
Any fancy suggestions?
Got the solution from a Python guy.
What I needed to do was encode the image to base64:
from StringIO import StringIO
from base64 import b64encode
img_io = StringIO()
my_PIL_image.save(img_io, 'JPEG')
img_data = img_io.getvalue()
data_url = "data:image/jpg;base64," + b64encode(img_data)

What data 'structure' does fs.get_last_version return?

When I use get_last_version to get an image from the database, what is actually returned ie an array, the merged binary data of all the chunks that make up the file (as a string), or something else?
dbname = 'grid_files'
db = connection[dbname]
fs = gridfs.GridFS(db)
filename = "my_image.jpg"
my_image_file = fs.get_last_version(filename=filename)
I'm wanting to base64 encode my_image_file with:
import base64
encoded_img_file = base64.b64encode(my_image_file)
return encoded_img_file
But I'm getting a 500 error.
I haven't been able to glean what is actually returned when using get_last_version from the docs:
http://api.mongodb.org/python/current/api/gridfs/#gridfs.GridFS.get_last_version
More Research:
I followed the logic from this post:
http://blog.pythonisito.com/2012/05/gridfs-mongodb-filesystem.html
And in shell running Python on server could see that Binary() was returned - so should I be able to base64 encode this as demonstrated above?:
>>> import pymongo
>>> import gridfs
>>> import os
>>> hostname = os.environ['OPENSHIFT_MONGODB_DB_URL']
>>> conn = pymongo.MongoClient(host=hostname)
>>> db = conn.grid_files
>>> fs = gridfs.GridFS(db)
>>> list(db.fs.chunks.find())
[{u'files_id': ObjectId('52db4d9e70914413718f2ec4'), u'_id': ObjectId('52db4d9e7
0914413718f2ec5'), u'data': Binary('lots of binary code', 0), u'n': 0}]
Unless there is a better answer, this is what I've come up with.
get_last_version returns a Binary() object.
In regards to base64 encoding it, and returning it, this is how I did it:
dbname = 'grid_files'
db = connection[dbname]
fs = gridfs.GridFS(db)
filename = "my_image.jpg"
my_image_file = fs.get_last_version(filename=filename)
encoded_img_file = base64.b64encode(my_image_file.read())
return encoded_img_file
Then accessed it on the front end with:
$("#my_img").attr("src", "data:image/png;base64," + data);

Python - fetching image from urllib and then reading EXIF data from PIL Image not working

I use the following code to fetch an image from a url in python :
import urllib
from PIL import Image
urllib.urlretrieve("http://www.gunnerkrigg.com//comics/00000001.jpg", "00000001.jpg")
filename = '00000001.jpg'
img = Image.open(filename)
exif = img._getexif()
However, this way the exif data is always "None". But when I download the image by hand and then read the EXIF data in python, the image data is not None.
I have also tried the following approach (from Downloading a picture via urllib and python):
import urllib
f = open('00000001.jpg','wb')
f.write(urllib.urlopen('http://www.gunnerkrigg.com//comics/00000001.jpg').read())
f.close()
filename = '00000001.jpg'
img = Image.open(filename)
exif = img._getexif()
But this gives me 'None' for 'exif' again. Could someone please point out what I may do to solve this problem?
Thank you!
The .jpg you are using contains no exif information. If you try the same python with an exif example from http://www.exif.org/samples/ , I think you will find it works.