Saving a Base64 string representing an image on Django database - django

I'm fairly new to Django and I'm looking for the best way to store Base64 images on my Dajango db/server.
My goal is to be able to send batches of images through http request, therefore it would make sense to send them as encoded Base64 images, I may end up rendering them on a webpage but they will primarily be sent to a desktop application in batches.
After looking through several other posts it looks like there are three approaches and I'm not sure what best fits my needs.
Store Base64 string in model TextField
Store Base64 string in FileField
Store an image in an image field and convert it to Base64 when needed
My concern with option 1 is that storing large textfields in the db will hinder performance, however like I said I'm new so I really don't have any idea.
Option 2 seems to make sense to me as this is similar to how django handles images, by storing them else and just referencing the location in the db. However, I'm not sure if this is simply because sqlite does not support fields of this type. I also see the potential for additional overhead, having to open and read files vs just reading a text field.
Lastly option 3 appears to be a rather unattractive option due to my use case, as these base64 images will be primarily sent in batches via http requests so I figured it would be best to store the converted version rather than encode each image upon each request.
I would greatly appreciate any insight the community could offer as to which approach might make the most sense for me to take. What are your thoughts?
Follow up question, if I intend on converting my database to Postgres does anything change regarding which approach I should take?

It is better not to store binary data in the database. Typically this will requires escaping to create/update/retrieve data, and thus results in less efficient access.
What is usually done is working with a FileField [Django-doc] or an ImageField [Django-doc]. For these two model fields, it will store the file in the file system, and save the path in the database. This will thus reduce the amount of overhead to load or save an object.
You can decide to store a base64 encoding of the file, but likely that will not be more efficient: it means that it requires more time to read the file from the disk. Encoding to base64 is efficient, and therefore it will likely be more efficient to store the file in a compact way and return a base64 that is created in the view.

Related

insert base64 strings in Dexiejs

I am building an ionic 3 app and I want to set up an upload based on the ImagePicker Cordova plugin.
I use Dexie to persist some data, and I wonder if persisting whole base64 strings would be alright. Or is it too heavy?
I want to persist the images chosen with the image picker. When an upload is suspended or stopped i would be able to restart the upload for those.
Anybody using any other type of persistence of Base64 images?
Thank you
It depends on the size of the images. Unless images are larger than 10 megabytes, I think you are safe. There is no direct limit of document sizes in indexedDB except for the quota you are given for the whole db instance, which can vary per platform and can be extended on modern platforms using navigator.storage.persist(). Do not index the property containing the large string though, since it would affect performance badly and eventually trigger unknown bugs.
In case you target modern platforms (Chromium, Firefox and Safari 10.1), you don't need to convert the images to base64. Instead you can store the binary data directly in a property of type Uint8Array.

Best way to efficiently store (and be able to filter by) hashes in Django

I'm trying to create a Django app where I can look up the hash of a file (md5, sha1, or sha256) in order to get back attributes of that file. Currently, I'm struggling with making a decision on how to efficiently store those values in the database.
I've seen Django's BinaryField, but unfortunately, that appears geared towards purely storage of a password (hash), and the documentation explicitly mentions that you cannot filter on that field when dealing with a QuerySet. This however, is of critical importance for my application. I've seen another post on SA regarding storing MD5 hashes specifically, where it is called out (with good performance numbers) that Django's UUIDField is a perfect fit. However, a UUIDField doesn't support more than 16 bytes, and so doesn't work for SHA1 or SHA256 hashes.
I've looked online to see if someone has come up with a Custom Field implementation for this but came up dry. Does anyone have a good idea on how to proceed? I'm specifically trying to avoid storing the hash as (say) base64 or the hexstring equivalent (using a CharField); I want to just store the bytes of the hash. It seems strange to me that I can't simply store 20 or 32 raw bytes into the database and be able to filter on that.
Thanks in advance! Let me know if there is any more information I can provide.
EDIT: I am using Postgresql as the backend, and python3.

Django textarea for 50,000,000 character data

I've a django application that deals with large text files, up roughly 50,000,000 characters. For a variety of reasons it's desirable to store them as a model field.
We are using sqlite for dev and postgres for production.
Users do not need to enter the data via any UI.
The field does not need to be visible in the admin or elsewhere to the user.
Several questions:
Is it practicable to store this much text in a textarea field?
What, if any, performance issues will this likely create?
Would using a binary field improve performance?
Any guidance would be greatly appreciated.
Another consideration is that when you are querying that model, make sure you use defer on your querysets, so you aren't transferring 50MB of data down the pipe everytime you want to retrieve an object from the db.
I highly recommend storing those files on disk or S3 or equivalent in a FileField though. You won't really be able to query on the contents of those files efficiently.
This is more related to the database you use. You use SQLite so look at the limits of SQLite:
The maximum number of bytes in a string or BLOB in SQLite is defined
by the preprocessor macro SQLITE_MAX_LENGTH. The default value of this
macro is 1 billion (1 thousand million or 1,000,000,000).
http://www.sqlite.org/limits.html
Besides that, it's probably better to use a TextField in Django.
A binary field wouldn't improve performance. Binary fields are meant for binary data, and you are storing text.
After some experimentation we decided to use a Django file field and not store the file contents in Postgresql. Performance was the primary decision driver. With file field we are able to query very quickly to get the underlying field file which in turn can be accessed directly at the OS level with much higher performance than is available if the data is stored in a Postgresql table.
Thanks for the input. It was a big help.

Store a file metadata in an extra file

I have a bunch of image files (mostly .jpg). I would like to store metadata about these files (e.g. dominant color, color distribution, maximum gradient flow field, interest points, ...). These data fields are not fixed and are not available in all images.
Right now I am storing the metadata for each file as a separate file with the same name but a different extension. The format is just text:
metadataFieldName1 metadataFieldValue1
metadataFieldName2 metadataFieldValue2
This gets me wondering, is there a better/easier way to store these metadata? I thought of ProtocolBuffer since I need to be able to read and write these information in both C++ and Python. But, how do I support the case where some metadata are not available?
I would suggest that you store such metadata within the image files themselves.
Most image formats support storing metadata. I think that .jpeg support it through Exif.
If you're on Windows you can use the WIC to store and retrieve metadata in a unified manner.
Why protocol buffers and not XML or INI files or whatever text-ish format? Just choose some format...
And what do you mean with "metadata not available"? It is up to your application to respond to such error situations...what has this to do with the format of the storage?
Look at http://www.yaml.org. YAML is less verbose than XML and more human friendly to read.
There are YAML libraries for both C++, Python and many other languages.
Example:
import yaml
data = { "field1" : "value1",
"field2" : "value2" }
serializedData = yaml.dump(data, default_flow_style=False)
open("datafile", "w").write(serializedData)
I thought long on this matter and went with ProtocolBuffer to store metadata for my images. For each image e.g. Image00012.jpg, I store the metadata in Image00012.jpg.pbmd. Once I have my .proto file setup, the Python class and C++ class got auto-generated. It works very well and require me to spend little time on parsing (clearly better than writing custom reader for YAML files).
RestRisiko brings up a good point about how I should handle metadata not available. The good thing about ProtocolBuffer is it supports optional/required fields. This solves my problem on this front.
The reason I think XML and INI are not good for this purpose is because many of my metadata are complex (color distribution, ...) and require a bit of storage customization. ProtocolBuffer allows me to nest proto declaration. Plus, the size of the metadata file and the parsing speed is clearly superior to my hand-roll XML reading/writing.

How to get binary post data in Django !nicely?

forgive me if this is a bit of a newbie question, I started to learn Django yesterday, and I'm trying not to get into bad habits, i.e. I am trying to do things "the django way" from the start.
I have a view that recieves binary data as a http post field. Now Django of course autoconverts my binary data to a unicode string.
My question is, how do I just get the raw binary data?
A couple of things occurred to me. Let request be the request I'm processing.
Using request.raw_post_data would involve parsing the data again - when appearantly request.POST actually stores raw data and I am actually just trying to get around the on-the-fly conversion (and besides, that is new in the development version).
Using base64 or so to transfer the data would work, but seems like too much overhead when the data transfer itself is not the problem.
doing request.encoding="foo" before getting that field (and reassigning afterwards) doesn't work either because I still get a unicode string, besides feeling like a bit of a dirty hack. Using "base64" here (not as bad as for the transfer encoding) gives me an
AssertionError.
Thanks in advance for your ideas!
EDIT:
To clarify - I am not talking about a classic file upload here, but as binary data stored in a POST field. I'd like to do it that way because the only way I want to interface with that view is via an upload script. Using a normal POST field makes both the client and the server much simpler in that case.
Some might say that storing binary data in a standard form field is a bad habit in some way :)
You could use standard library methods of Python to convert your string back to a binary representation.
Take a look at binascii — Convert between binary and ASCI
Posting before edit:
What about this piece of code (receiving data from a POST)
def handleFile(self, request):
file = request.FILES["file"]
destination = open('filename.ext', 'wb')
for chunk in file.chunks():
destination.write(chunk)
destination.close()
Works for me.