Django with Pluggable MongoDB Storage troubles - django

I'm trying to use django, and mongoengine to provide the storage backend only with GridFS. I still have a MySQL database.
I'm running into a strange (to me) error when I'm deleting from the django admin and am wondering if I am doing something incorrectly.
my code looks like this:
# settings.py
from mongoengine import connect
connect("mongo_storage")
# models.py
from mongoengine.django.storage import GridFSStorage
class MyFile(models.Model):
name = models.CharField(max_length=50)
content = models.FileField(upload_to="appsfiles", storage=GridFSStorage())
creation_time = models.DateTimeField(auto_now_add=True)
last_update_time = models.DateTimeField(auto_now=True)
I am able to upload files just fine, but when I delete them, something seems to break and the mongo database seems to get in an unworkable state until I manually delete all FileDocument.objects. When this happens I can't upload files or delete them from the django interface.
From the stack trace I have:
/home/projects/vector/src/mongoengine/django/storage.py in _get_doc_with_name
doc = [d for d in docs if getattr(d, self.field).name == name] ...
▼ Local vars
Variable Value
_[1]
[]
d
docs
Error in formatting: cannot set options after executing query
name
u'testfile.pdf'
self
/home/projects/vector/src/mongoengine/fields.py in __getattr__
raise AttributeError
Am I using this feature incorrectly?
UPDATE:
thanks to #zeekay's answer I was able to get a working gridfs storage plugin to work. I ended up not using mongoengine at all. I put my adapted solution on github. There is a clear sample project showing how to use it. I also uploaded the project to pypi.
Another Update:
I'd highly recommend the django-storages project. It has lots of storage backed options and is used by many more people than my original proposed solution.

I think you are better off not using MongoEngine for this, I haven't had much luck with it either. Here is a drop-in replacement for mongoengine.django.storage.GridFSStorage, which works with the admin.
from django.core.files.storage import Storage
from django.conf import settings
from pymongo import Connection
from gridfs import GridFS
class GridFSStorage(Storage):
def __init__(self, host='localhost', port=27017, collection='fs'):
for s in ('host', 'port', 'collection'):
name = 'GRIDFS_' + s.upper()
if hasattr(settings, name):
setattr(self, s, getattr(settings, name))
for s, v in zip(('host', 'port', 'collection'), (host, port, collection)):
if v:
setattr(self, s, v)
self.db = Connection(host=self.host, port=self.port)[self.collection]
self.fs = GridFS(self.db)
def _save(self, name, content):
self.fs.put(content, filename=name)
return name
def _open(self, name, *args, **kwars):
return self.fs.get_last_version(filename=name)
def delete(self, name):
oid = fs.get_last_version(filename=name)._id
self.fs.delete(oid)
def exists(self, name):
return self.fs.exists({'filename': name})
def size(self, name):
return self.fs.get_last_version(filename=name).length
GRIDFS_HOST, GRIDFS_PORT and GRIDFS_COLLECTION can be defined in your settings or passed as host, port, collection keyword arguments to GridFSStorage in your model's FileField.
I referred to Django's custom storage documenation, and loosely followed this answer to a similar question.

Related

Avoiding circular imports in Django Models (Config class)

I've created a Configuration model in django so that the site admin can change some settings on the fly, however some of the models are reliant on these configurations. I'm using Django 2.0.2 and Python 3.6.4.
I created a config.py file in the same directory as models.py.
Let me paracode (paraphase the code? Real Enum has many more options):
# models.py
from .config import *
class Configuration(models.Model):
starting_money = models.IntegerField(default=1000)
class Person(models.Model):
funds = models.IntegarField(default=getConfig(ConfigData.STARTING_MONEY))
# config.py
from .models import Configuration
class ConfigData(Enum):
STARTING_MONEY = 1
def getConfig(data):
if not isinstance(data, ConfigData):
raise TypeError(f"{data} is not a valid configuration type")
try:
config = Configuration.objects.get_or_create()
except Configuration.MultipleObjectsReturned:
# Cleans database in case multiple configurations exist.
Configuration.objects.exclude(Configuration.objects.first()).delete()
return getConfig(data)
if data is ConfigData.MAXIMUM_STAKE:
return config.max_stake
How can I do this without an import error? I've tried absolute imports
You can postpone loading the models.py by loading it in the getConfig(data) function, as a result we no longer need models.py at the time we load config.py:
# config.py (no import in the head)
class ConfigData(Enum):
STARTING_MONEY = 1
def getConfig(data):
from .models import Configuration
if not isinstance(data, ConfigData):
raise TypeError(f"{data} is not a valid configuration type")
try:
config = Configuration.objects.get_or_create()
except Configuration.MultipleObjectsReturned:
# Cleans database in case multiple configurations exist.
Configuration.objects.exclude(Configuration.objects.first()).delete()
return getConfig(data)
if data is ConfigData.MAXIMUM_STAKE:
return config.max_stake
We thus do not load models.py in the config.py. We only check if it is loaded (and load it if not) when we actually execute the getConfig function, which is later in the process.
Willem Van Onsem's solution is a good one. I have a different approach which I have used for circular model dependencies using django's Applications registry. I post it here as an alternate solution, in part because I'd like feedback from more experienced python coders as to whether or not there are problems with this approach.
In a utility module, define the following method:
from django.apps import apps as django_apps
def model_by_name(app_name, model_name):
return django_apps.get_app_config(app_name).get_model(model_name)
Then in your getConfig, omit the import and replace the line
config = Configuration.objects.get_or_create()
with the following:
config_class = model_by_name(APP_NAME, 'Configuration')
config = config_class.objects.get_or_create()

Are there any Django packages to create signed urls for Google Cloud Storage resources?

I'm writing a fairly simply photo app using django-rest-framework for the API and django-storages for the storage engine. The front end is being written in Vue.js. I have the uploading part working, and now I'm trying to serve up the photos. As now seems obvious when the browser tries to load the images from GCS, I just get a bunch of 403 Forbidden errors. I did some reading up on this and it seems that the best practice in my case would be to generate signed urls that expire in some amount of time. I haven't been able to find a package for this, which is what I was hoping for. Short of that, it's not clear to me precisely how to do this in Django.
This is a working code in django 1.11 with python3.5.
import os
from google.oauth2 import service_account
from google.cloud import storage
class CloudStorageURLSigner(object):
#staticmethod
def get_video_signed_url(bucket_name, file_path):
creds = service_account.Credentials.from_service_account_file(
os.environ.get('GOOGLE_APPLICATION_CREDENTIALS')
)
bucket = storage.Client().get_bucket(bucket_name)
blob = bucket.blob(file_path)
signed_url = blob.generate_signed_url(
method='PUT',
expiration=1545367030, #epoch time
content_type='audio/mpeg', #change_accordingly
credentials=creds
)
return signed_url
Yes, take a look at google-cloud-storage
Installation:
pip install google-cloud-storage
Also, make sure to refer to API Documentation as you need more things.
Hope it helps!
I ended up solving this problem by using to_representation in serializers.py:
from google.cloud.storage import Blob
client = storage.Client()
bucket = client.get_bucket('myBucket')
def to_representation(self, value):
try:
blob = Blob(name=value.name, bucket=bucket)
signed_url = blob.generate_signed_url(expiration=datetime.timedelta(minutes=5))
return signed_url
except ValueError as e:
print(e)
return value
Extending #Evan Zamir's answer, instead of reassigning client and bucket you can get them from Django's default_storage (this will save time since these are already available).
This is in settings.py
from datetime import timedelta
from google.oauth2 import service_account
GS_CREDENTIALS = service_account.Credentials.from_service_account_file('credentials.json')
DEFAULT_FILE_STORAGE = "storages.backends.gcloud.GoogleCloudStorage"
GS_BUCKET_NAME = "my-bucket"
GS_EXPIRATION = timedelta(seconds=60)
In serializers.py
from django.core.files.storage import default_storage
from google.cloud.storage import Blob
from rest_framework import serializers
class SignedURLField(serializers.FileField):
def to_representation(self, value):
try:
blob = Blob(name=value.name, bucket=default_storage.bucket)
signed_url = blob.generate_signed_url(expiration=default_storage.expiration)
return signed_url
except ValueError as e:
print(e)
return value
You can use this class in your serializer like this,
class MyModelSerializer(serializers.ModelSerializer):
file = SignedURLField()
Note: Do not provide GS_DEFAULT_ACL = 'publicRead' if you want signed URLs as it creates public URLs (that do not expire)

Creating a test database through the shell. Missing connection.creation.create_test_db()

The way this is supposed to be done is:
from django.db import connection
db = connection.creation.create_test_db() # Create the test db
However, the connection i import has no methods or attributes. It's type is django.db.DefaultConnectionProxy.
In the django/db/__init__.py lies the definition:
class DefaultConnectionProxy(object):
"""
Proxy for accessing the default DatabaseWrapper object's attributes. If you
need to access the DatabaseWrapper object itself, use
connections[DEFAULT_DB_ALIAS] instead.
"""
def __getattr__(self, item):
return getattr(connections[DEFAULT_DB_ALIAS], item)
def __setattr__(self, name, value):
return setattr(connections[DEFAULT_DB_ALIAS], name, value)
I've imported django.db.connections and found it has the following attributes/methods:
connections.all
connections.databases
connections.ensure_defaults
no sign of DEFAULT_DB_ALIAS.
I'm looking for ideas on how to debug this. Wouldn't want to post a ticket to Django if this has something to do with my configuration.

django-photologue upload_to

I have been playing around with django-photologue for a while, and find this a great alternative to all other image handlings apps out there.
One thing though, I also use django-cumulus to push my uploads to my CDN instead of running it on my local machine / server.
When I used imagekit, I could always pass a upload_to='whatever' but I cannot seem to do this with photologue as it automatically inserts the imagefield. How would I go about achieving some sort of an overwrite?
Regards
I think you can hook into the pre_save signal of the Photo model, and change the upload_to field, just before the instance is saved to the database.
Take a look at this:
http://docs.djangoproject.com/en/dev/topics/signals/
Managed to find a workaround for it, however this requires you to make the changes in photologue/models.py
if PHOTOLOGUE_PATH is not None:
if callable(PHOTOLOGUE_PATH):
get_storage_path = PHOTOLOGUE_PATH
else:
parts = PHOTOLOGUE_PATH.split('.')
module_name = '.'.join(parts[:-1])
module = __import__(module_name)
get_storage_path = getattr(module, parts[-1])
else:
def get_storage_path(instance, filename):
dirname = 'photos'
if hasattr(instance, 'upload_to'):
if callable(instance.upload_to):
dirname = instance.upload_to()
else: dirname = instance.upload_to
return os.path.join(PHOTOLOGUE_DIR,
os.path.normpath( force_unicode(datetime.now().strftime(smart_str(dirname))) ),
filename)
And then in your apps models do the following:
class MyModel(ImageModel):
text = ...
name = ...
def upload_to(self):
return 'yourdirectorynamehere'
You can use the setting PHOTOLOGUE_PATH to provide your own callable. Define a method which takes instance and filename as parameters then return whatever you want. For example in your settings.py:
import photologue
...
def PHOTOLOGUE_PATH(instance, filename):
folder = 'myphotos' # Add your logic here
return os.path.join(photologue.models.PHOTOLOGUE_DIR, folder, filename)
Presumably (although I've not tested this) you could find out more about the Photo instance (e.g. what other instances it relates to) and choose the folder accordingly.

Only accept a certain file type in FileField, server-side

How can I restrict FileField to only accept a certain type of file (video, audio, pdf, etc.) in an elegant way, server-side?
One very easy way is to use a custom validator.
In your app's validators.py:
def validate_file_extension(value):
import os
from django.core.exceptions import ValidationError
ext = os.path.splitext(value.name)[1] # [0] returns path+filename
valid_extensions = ['.pdf', '.doc', '.docx', '.jpg', '.png', '.xlsx', '.xls']
if not ext.lower() in valid_extensions:
raise ValidationError('Unsupported file extension.')
Then in your models.py:
from .validators import validate_file_extension
... and use the validator for your form field:
class Document(models.Model):
file = models.FileField(upload_to="documents/%Y/%m/%d", validators=[validate_file_extension])
See also: How to limit file types on file uploads for ModelForms with FileFields?.
Warning
For securing your code execution environment from malicious media files
Use Exif libraries to properly validate the media files.
Separate your media files from your application code
execution environment
If possible use solutions like S3, GCS, Minio or
anything similar
When loading media files on client side, use client native methods (for example if you are loading the media files non securely in a
browser, it may cause execution of "crafted" JavaScript code)
Django in version 1.11 has a newly added FileExtensionValidator for model fields, the docs is here: https://docs.djangoproject.com/en/dev/ref/validators/#fileextensionvalidator.
An example of how to validate a file extension:
from django.core.validators import FileExtensionValidator
from django.db import models
class MyModel(models.Model):
pdf_file = models.FileField(
upload_to="foo/", validators=[FileExtensionValidator(allowed_extensions=["pdf"])]
)
Note that this method is not safe. Citation from Django docs:
Don’t rely on validation of the file extension to determine a file’s
type. Files can be renamed to have any extension no matter what data
they contain.
There is also new validate_image_file_extension (https://docs.djangoproject.com/en/dev/ref/validators/#validate-image-file-extension) for validating image extensions (using Pillow).
A few people have suggested using python-magic to validate that the file actually is of the type you are expecting to receive. This can be incorporated into the validator suggested in the accepted answer:
import os
import magic
from django.core.exceptions import ValidationError
def validate_is_pdf(file):
valid_mime_types = ['application/pdf']
file_mime_type = magic.from_buffer(file.read(1024), mime=True)
if file_mime_type not in valid_mime_types:
raise ValidationError('Unsupported file type.')
valid_file_extensions = ['.pdf']
ext = os.path.splitext(file.name)[1]
if ext.lower() not in valid_file_extensions:
raise ValidationError('Unacceptable file extension.')
This example only validates a pdf, but any number of mime-types and file extensions can be added to the arrays.
Assuming you saved the above in validators.py you can incorporate this into your model like so:
from myapp.validators import validate_is_pdf
class PdfFile(models.Model):
file = models.FileField(upload_to='pdfs/', validators=(validate_is_pdf,))
You can use the below to restrict filetypes in your Form
file = forms.FileField(widget=forms.FileInput(attrs={'accept':'application/pdf'}))
There's a Django snippet that does this:
import os
from django import forms
class ExtFileField(forms.FileField):
"""
Same as forms.FileField, but you can specify a file extension whitelist.
>>> from django.core.files.uploadedfile import SimpleUploadedFile
>>>
>>> t = ExtFileField(ext_whitelist=(".pdf", ".txt"))
>>>
>>> t.clean(SimpleUploadedFile('filename.pdf', 'Some File Content'))
>>> t.clean(SimpleUploadedFile('filename.txt', 'Some File Content'))
>>>
>>> t.clean(SimpleUploadedFile('filename.exe', 'Some File Content'))
Traceback (most recent call last):
...
ValidationError: [u'Not allowed filetype!']
"""
def __init__(self, *args, **kwargs):
ext_whitelist = kwargs.pop("ext_whitelist")
self.ext_whitelist = [i.lower() for i in ext_whitelist]
super(ExtFileField, self).__init__(*args, **kwargs)
def clean(self, *args, **kwargs):
data = super(ExtFileField, self).clean(*args, **kwargs)
filename = data.name
ext = os.path.splitext(filename)[1]
ext = ext.lower()
if ext not in self.ext_whitelist:
raise forms.ValidationError("Not allowed filetype!")
#-------------------------------------------------------------------------
if __name__ == "__main__":
import doctest, datetime
doctest.testmod()
First. Create a file named formatChecker.py inside the app where the you have the model that has the FileField that you want to accept a certain file type.
This is your formatChecker.py:
from django.db.models import FileField
from django.forms import forms
from django.template.defaultfilters import filesizeformat
from django.utils.translation import ugettext_lazy as _
class ContentTypeRestrictedFileField(FileField):
"""
Same as FileField, but you can specify:
* content_types - list containing allowed content_types. Example: ['application/pdf', 'image/jpeg']
* max_upload_size - a number indicating the maximum file size allowed for upload.
2.5MB - 2621440
5MB - 5242880
10MB - 10485760
20MB - 20971520
50MB - 5242880
100MB 104857600
250MB - 214958080
500MB - 429916160
"""
def __init__(self, *args, **kwargs):
self.content_types = kwargs.pop("content_types")
self.max_upload_size = kwargs.pop("max_upload_size")
super(ContentTypeRestrictedFileField, self).__init__(*args, **kwargs)
def clean(self, *args, **kwargs):
data = super(ContentTypeRestrictedFileField, self).clean(*args, **kwargs)
file = data.file
try:
content_type = file.content_type
if content_type in self.content_types:
if file._size > self.max_upload_size:
raise forms.ValidationError(_('Please keep filesize under %s. Current filesize %s') % (filesizeformat(self.max_upload_size), filesizeformat(file._size)))
else:
raise forms.ValidationError(_('Filetype not supported.'))
except AttributeError:
pass
return data
Second. In your models.py, add this:
from formatChecker import ContentTypeRestrictedFileField
Then instead of using 'FileField', use this 'ContentTypeRestrictedFileField'.
Example:
class Stuff(models.Model):
title = models.CharField(max_length=245)
handout = ContentTypeRestrictedFileField(upload_to='uploads/', content_types=['video/x-msvideo', 'application/pdf', 'video/mp4', 'audio/mpeg', ],max_upload_size=5242880,blank=True, null=True)
Those are the things you have to when you want to only accept a certain file type in FileField.
after I checked the accepted answer, I decided to share a tip based on Django documentation. There is already a validator for use to validate file extension. You don't need to rewrite your own custom function to validate whether your file extension is allowed or not.
https://docs.djangoproject.com/en/3.0/ref/validators/#fileextensionvalidator
Warning
Don’t rely on validation of the file extension to determine a file’s
type. Files can be renamed to have any extension no matter what data
they contain.
I think you would be best suited using the ExtFileField that Dominic Rodger specified in his answer and python-magic that Daniel Quinn mentioned is the best way to go. If someone is smart enough to change the extension at least you will catch them with the headers.
You can define a list of accepted mime types in settings and then define a validator which uses python-magic to detect the mime-type and raises ValidationError if the mime-type is not accepted. Set that validator on the file form field.
The only problem is that sometimes the mime type is application/octet-stream, which could correspond to different file formats. Did someone of you overcome this issue?
Additionally i Will extend this class with some extra behaviour.
class ContentTypeRestrictedFileField(forms.FileField):
...
widget = None
...
def __init__(self, *args, **kwargs):
...
self.widget = forms.ClearableFileInput(attrs={'accept':kwargs.pop('accept', None)})
super(ContentTypeRestrictedFileField, self).__init__(*args, **kwargs)
When we create instance with param accept=".pdf,.txt", in popup with file structure as a default we will see files with passed extension.
Just a minor tweak to #Thismatters answer since I can't comment. According to the README of python-magic:
recommend using at least the first 2048 bytes, as less can produce incorrect identification
So changing 1024 bytes to 2048 to read the contents of the file and get the mime type base from that can give the most accurate result, hence:
def validate_extension(file):
valid_mime_types = ["application/pdf", "image/jpeg", "image/png", "image/jpg"]
file_mime_type = magic.from_buffer(file.read(2048), mime=True) # Changed this to 1024 to 2048
if file_mime_type not in valid_mime_types:
raise ValidationError("Unsupported file type.")
valid_file_extensions = [".pdf", ".jpeg", ".png", ".jpg"]
ext = os.path.splitext(file.name)[1]
if ext.lower() not in valid_file_extensions:
raise ValidationError("Unacceptable file extension.")