Folks, I need help understanding some details about how Django saves model files. I've written a test that involves creation of files (in a temporary directory via tempfile) and has the following lines:
TEMP_DIR = tempfile.TemporaryDirectory()
TEMP_DIR_PATH = TEMP_DIR.name
...
#override_settings(MEDIA_ROOT=TEMP_DIR_PATH)
def create_photo(self, album_number, photo_number):
...
p = Photo.objects.create(
number=photo_number,
album=album,
added_by=self.user,
image=SimpleUploadedFile(
name=...,
content=open(..., 'rb').read(),
content_type='image/jpeg'
),
remarks='-'
)
p.full_clean()
p.save()
return p
This code works, except for one thing that confuses me. The line p = Photo.objects.create causes a file to appear in the temporary directory. Then p.full_clean() does nothing to the file. However when I execute p.save(), the file disappears from the temporary directory. If I remove p.save(), the file stays there when the function returns.
So my test function
def test_image_file_present(self):
"""When a photo is added to DB, the file actually appears in MEDIA."""
p = self.create_photo(3, 2)
image_filename = p.image.file.name
if not os.path.exists(image_filename):
self.fail('Image file not found')
fails if p.save() is there but passes if I remove p.save().
Why would object.save() cause the file to disappear?
As a bonus question, what's the purpose of .save() if the file and the Django model object appear already during Photo.objects.create? I've checked that the pre-save signal is sent by Photo.object.create() as well as by p.save().
Related
I have the following model in Django:
class AksOrder(models.Model):
zip_file = models.FileField(upload_to='aks_zips/%M/%S/', blank=True)
and in my views I have in essential these functions:
def gen_zip(pk, name, vars):
zipObj = ZipFile(os.path.join('/tmp/', str(name) + '_' + str(pk) + '.zip'), 'w')
zipObj.write(pdf_files[0].path, '/filea.pdf')
zipObj.write(pdf_files[1].path, '/fileb.pdf')
def aksorder_complete(request, pk):
ao = get_object_or_404(AksOrder, id=pk)
zipObj = generate_shop_zip(ao.c.pk, ao.dl, ao.vars)
ao.zip_file.save('file.zip', zipObj)
I did not only try this version, but this one seems the most reasonable and logic one to me. I get a There is no item named 65536 in the archive. When I modify it slightly and close the file at the end of zip-writing in the first function, I get a ValueError: Attempt to use ZIP archive that was already closed message. Both times, the zip-File is generated properly in /tmp/ I could not work arount it. And that's only locally, I need to do it for S3 later...
I finally achieved it: I added a zipObj.close() to the first function at the end and I modified the 2nd function like so:
file = open('path/to/file.zip', 'rb')
ao.zip_file.save('name.zip', file)
apparently, the rb mode in file-open was decisive.
I want to edit an uploaded file on byte level (i.e. searching and removing a certain byte sequence) before saving it.
I have a pre_save signal set up in the following way:
class Snippet(models.Model):
name = models.CharField(max_length=256, unique=True)
audio_file = models.FileField(upload_to=generate_file_name, blank=True, null=True)
#receiver(models.signals.pre_save, sender=Snippet)
def prepare_save(sender, instance, **kwargs):
if instance.audio_file:
remove_headers(instance)
Now I have had problems implementing the remove_headers function in a way that I can edit the file while it is still in memory and have it stored afterwards. I tried among others the following:
def remove_headers(instance):
byte_sequence = b'bytestoremove'
f = instance.audio_file.read()
file_in_hex = f.hex()
file_in_hex = re.sub(byte_sequence.hex(), '', file_in_hex)
x = b''
x = x.fromhex(file_in_hex)
tmp_file = TemporaryFile()
tmp_file.write(x)
tmp_file.flush()
tmp_file.seek(0)
instance.audio_file.save(instance.audio_file.name, tmp_file, save=True)
This first of all would result in an infinite loop. But this can be mitigated by e.g. only calling the remove_headers method on create or so. It did however not work, the file was unchanged. I also tried replacing the last line with:
instance.audio_file = File(tmp_file, name=instance.audio_file.name)
This however resulted in an empty file to be written/saved.
Curiously when writing a test, this method seems to work:
def test_header_removed(self):
snippet = mommy.make(Snippet)
snippet.audio_file.save('newname.mp3', ContentFile('contentbytestoremovecontent'))
snippet.save()
self.assertEqual(snippet.audio_file.read(), b'contentcontent')
This test does not fail, despite the file being zero bytes in the end.
What am I missing here?
The second solution was almost correct. The reason the files ended up being empty (actually this only happened to bigger files) was, that sometimes you have to seek to the beginning of the file after opening it. So the beginngni of remove_headers needs to be changed:
def remove_headers(instance):
byte_sequence = b'bytestoremove'
instance.audio_file.seek(0)
f = instance.audio_file.read()
file_in_hex = f.hex()
I wrote a cmd line routine to import a kml file into a geoDjango application, which works fine when you feed it a locally saved KML file path (using the datasource object).
Now I am writing a web file upload dialog, to achieve the same thing. This is the beginning of the code that I have, problem is, that the GDAL DataSource object does not seem to understand Djangos UploadedFile format. It is held in memory and not a file path as expected.
What would be the best strategy to convert the UploadedFile to a normal file, and access this through a path? I dont want to keep the file after processing.
def createFeatureSet(request):
if request.method == 'POST':
inMemoryFile = request.FILES['myfile']
name = inMemoryFile.name
POSTGIS_SRID = 900913
ds = DataSource(inMemoryFile) #This line doesnt work!!!
for layer in ds:
if layer.geom_type in (OGRGeomType('Point'), OGRGeomType('Point25D'), OGRGeomType('MultiPoint'), OGRGeomType('MultiPoint25D')):
layerGeomType = OGRGeomType('MultiPoint').django
elif layer.geom_type in (OGRGeomType('LineString'),OGRGeomType('LineString25D'), OGRGeomType('MultiLineString'), OGRGeomType('MultiLineString25D')):
layerGeomType = OGRGeomType('MultiLineString').django
elif layer.geom_type in (OGRGeomType('Polygon'), OGRGeomType('Polygon25D'), OGRGeomType('MultiPolygon'), OGRGeomType('MultiPolygon25D')):
layerGeomType = OGRGeomType('MultiPolygon').django
DataSource is a wrapper around GDAL's C API and needs an actual file. You'll need to write your upload somewhere on the disk, for insance using a tempfile. Then you can pass the file to DataSource.
Here is a suggested solution using a tempfile. I put the processing code in its own function which is now called.
f = request.FILES['myfile']
temp = tempfile.NamedTemporaryFile(delete=False)
temp.write(f.read())
temp.close()
createFeatureSet(temp.name, source_SRID= 900913)
I have a lot of user uploaded content and I want to validate that uploaded image files are not, in fact, malicious scripts. In the Django documentation, it states that ImageField:
"Inherits all attributes and methods from FileField, but also validates that the uploaded object is a valid image."
Is that totally accurate? I've read that compressing or otherwise manipulating an image file is a good validation test. I'm assuming that PIL does something like this....
Will ImageField go a long way toward covering my image upload security?
Django validates the image uploaded via form using PIL.
See https://code.djangoproject.com/browser/django/trunk/django/forms/fields.py#L519
try:
# load() is the only method that can spot a truncated JPEG,
# but it cannot be called sanely after verify()
trial_image = Image.open(file)
trial_image.load()
# Since we're about to use the file again we have to reset the
# file object if possible.
if hasattr(file, 'reset'):
file.reset()
# verify() is the only method that can spot a corrupt PNG,
# but it must be called immediately after the constructor
trial_image = Image.open(file)
trial_image.verify()
...
except Exception: # Python Imaging Library doesn't recognize it as an image
raise ValidationError(self.error_messages['invalid_image'])
PIL documentation states the following about verify():
Attempts to determine if the file is broken, without actually decoding
the image data. If this method finds any problems, it raises suitable
exceptions. This method only works on a newly opened image; if the
image has already been loaded, the result is undefined. Also, if you
need to load the image after using this method, you must reopen the
image file.
You should also note that ImageField is only validated when uploaded using form. If you save the model your self (e.g. using some kind of download script), the validation is not performed.
Another test is with the file command. It checks for the presence of "magic numbers" in the file to determine its type. On my system, the file package includes libmagic as well as a ctypes-based wrapper /usr/lib64/python2.7/site-packages/magic.py. It looks like you use it like:
import magic
ms = magic.open(magic.MAGIC_NONE)
ms.load()
type = ms.file("/path/to/some/file")
print type
f = file("/path/to/some/file", "r")
buffer = f.read(4096)
f.close()
type = ms.buffer(buffer)
print type
ms.close()
(Code from here.)
As to your original question: "Read the Source, Luke."
django/core/files/images.py:
"""
Utility functions for handling images.
Requires PIL, as you might imagine.
"""
from django.core.files import File
class ImageFile(File):
"""
A mixin for use alongside django.core.files.base.File, which provides
additional features for dealing with images.
"""
def _get_width(self):
return self._get_image_dimensions()[0]
width = property(_get_width)
def _get_height(self):
return self._get_image_dimensions()[1]
height = property(_get_height)
def _get_image_dimensions(self):
if not hasattr(self, '_dimensions_cache'):
close = self.closed
self.open()
self._dimensions_cache = get_image_dimensions(self, close=close)
return self._dimensions_cache
def get_image_dimensions(file_or_path, close=False):
"""
Returns the (width, height) of an image, given an open file or a path. Set
'close' to True to close the file at the end if it is initially in an open
state.
"""
# Try to import PIL in either of the two ways it can end up installed.
try:
from PIL import ImageFile as PILImageFile
except ImportError:
import ImageFile as PILImageFile
p = PILImageFile.Parser()
if hasattr(file_or_path, 'read'):
file = file_or_path
file_pos = file.tell()
file.seek(0)
else:
file = open(file_or_path, 'rb')
close = True
try:
while 1:
data = file.read(1024)
if not data:
break
p.feed(data)
if p.image:
return p.image.size
return None
finally:
if close:
file.close()
else:
file.seek(file_pos)
So it looks like it just reads the file 1024 bytes at a time until PIL says it's an image, then stops. This obviously does not integrity-check the entire file, so it really depends on what you mean by "covering my image upload security": illicit data could be appended to an image and passed through your site. Someone could DOS your site by uploading a lot of junk or a really big file. You could be vulnerable to an injection attack if you don't check any uploaded captions or make assumptions about the image's uploaded filename. And so on.
I'd like to store uploaded files into a specific directory that depends on the URI of the POST request. Perhaps, I'd also like to rename the file to something fixed (the name of the file input for example) so I have an easy way to grep the file system, etc. and also to avoid possible security problems.
What's the preferred way to do this in Django?
Edit: I should clarify that I'd be interested in possibly doing this as a file upload handler to avoid writing a large file twice to the file system.
Edit2: I suppose one can just 'mv' the tmp file to a new location. That's a cheap operation if on the same file system.
Fixed olooney example. It is working now
#csrf_exempt
def upload_video_file(request):
folder = 'tmp_dir2/' #request.path.replace("/", "_")
uploaded_filename = request.FILES['file'].name
BASE_PATH = '/home/'
# create the folder if it doesn't exist.
try:
os.mkdir(os.path.join(BASE_PATH, folder))
except:
pass
# save the uploaded file inside that folder.
full_filename = os.path.join(BASE_PATH, folder, uploaded_filename)
fout = open(full_filename, 'wb+')
file_content = ContentFile( request.FILES['file'].read() )
try:
# Iterate through the chunks.
for chunk in file_content.chunks():
fout.write(chunk)
fout.close()
html = "<html><body>SAVED</body></html>"
return HttpResponse(html)
except:
html = "<html><body>NOT SAVED</body></html>"
return HttpResponse(html)
Django gives you total control over where (and if) you save files. See: http://docs.djangoproject.com/en/dev/topics/http/file-uploads/
The below example shows how to combine the URL and the name of the uploaded file and write the file out to disk:
def upload(request):
folder = request.path.replace("/", "_")
uploaded_filename = request.FILES['file'].name
# create the folder if it doesn't exist.
try:
os.mkdir(os.path.join(BASE_PATH, folder))
except:
pass
# save the uploaded file inside that folder.
full_filename = os.path.join(BASE_PATH, folder, uploaded_filename)
fout = open(full_filename, 'wb+')
# Iterate through the chunks.
for chunk in fout.chunks():
fout.write(chunk)
fout.close()
Edit: How to do this with a FileUploadHandler? It traced down through the code and it seems like you need to do four things to repurpose the TemporaryFileUploadHandler to save outside of FILE_UPLOAD_TEMP_DIR:
extend TemporaryUploadedFile and override init() to pass through a different directory to NamedTemporaryFile. It can use the try mkdir except for pass I showed above.
extend TemporaryFileUploadHandler and override new_file() to use the above class.
also extend init() to accept the directory where you want the folder to go.
Dynamically add the request handler, passing through a directory determined from the URL:
request.upload_handlers = [ProgressBarUploadHandler(request.path.replace('/', '_')]
While non-trivial, it's still easier than writing a handler from scratch: In particular, you won't have to write a single line of error-prone buffered reading. Steps 3 and 4 are necessary because FileUploadHandlers are not passed request information by default, I believe, so you'll have to tell it separately if you want to use the URL somehow.
I can't really recommend writing a custom FileUploadHandler for this. It's really mixing layers of responsibility. Relative to the speed of uploading a file over the internet, doing a local file copy is insignificant. And if the file's small, Django will just keep it in memory without writing it out to a temp file. I have a bad feeling that you'll get all this working and find you can't even measure the performance difference.