convert multiple files from base64 to pdf in memory - django

I am receiving several files in base64 format and I need to upload them to aws s3 in pdf format but so far I have tried everything and I still can't do it, is there any way to convert them to pdf without creating a file?
i'm using django rest framwork
"balance":"base64String",
"stateOfCashflow":"base64String",
"financialStatementAudit":"base64String",
"managementReport":"base64String",
"certificateOfStockOwnership":"base64String",
"rentDeclaration":"base64String",

I solved it on my own, I found a library called drf_extra_fields which does just what I need.
in the serializer is necesary to use base64filefield which takes the string in base 64 and transforms it into pdf behind the scenes
from drf_extra_fields.fields import Base64FileField
import PyPDF2
import io
class PDFBase64File(Base64FileField):
ALLOWED_TYPES = ['pdf']
def get_file_extension(self, filename, decoded_file):
try:
PyPDF2.PdfFileReader(io.BytesIO(decoded_file))
except PyPDF2.utils.PdfReadError as e:
print(e)
else:
return 'pdf'
class PDFSerializer(serializers.ModelSerializer):
pdf = PDFBase64File()
class Meta:
model = pdf
fields = "__all__"

Related

Django channels image saving, TextField or ImageField

Chat app using django channels
I am using websockets to send base64 encoded string to the server, the base64 encoded string could be saved in TextField or saved in ImageField by decoding using base64 library, which method is preferred, why?
EDIT
I am interested in which method is preferred and why, but not how to implement
You can use this function for converting base64 data that you get from your request into django contentfile which can be added to image model field later
import base64
from django.core.files.base import ContentFile
def base64_file(data, name=None):
_format, _img_str = data.split(';base64,')
_name, ext = _format.split('/')
if not name:
name = _name.split(":")[-1]
return ContentFile(base64.b64decode(_img_str), name='{}.{}'.format(name, ext))
# Simple example
data = request.GET.get('base64data')
data = base64_file(data, name='profile_picture')
UserProfile.profile_picture = data
UserProfile.save()

How to convert an uploaded file (InMemoryUploadedFile) from pdf to jpeg in Django using wand?

I'm trying to convert a pdf file uploaded in Django to a jpg file. I would like to use the file directly in the InMemoryUploadedFile state.
I tried to use wand but without any success.
Here is the code I wrote:
from django.shortcuts import render
from wand.image import Image as wi
# Create your views here.
def readPDF(request):
context = {}
if request.method == 'POST':
uploaded_file = request.FILES['document']
if uploaded_file.content_type == 'application/pdf':
pdf = wi(filename=uploaded_file.name, resolution=300)
pdfImage = pdf.convert("jpeg")
return render(request, 'readPDF.html', {"pdf": pdfImage})
I tried different things like using uploaded_file.file or uploaded_file.name as the first argument for the wand image but without any success.`
I thank you in advance for your help!
Should be able to pass InMemoryUploadedFile directly to Wand's constructor.
uploaded_file = request.FILES['document']
if uploaded_file.content_type == 'application/pdf':
with wi(file=uploaded_file, resolution=300) as pdf:
# ...
However, I wouldn't recommend attempting to convert PDF pages to JPEGs in a HTTP request. Best to write the document to storage, and have a background worker manage the slow / unsafe tasks.

How to retrieve a .wav file through POST in django and store it in a data model?

I am learning VXML and Django. I am trying to find out how to cleanly retrieve a recording from some voice-xml (vxml) browser and pass it to the server side where I use django to further handle the passed information. Then I want to store the file somewhere in a .wav file to replay it later. I have the following code snippets:
In the VXML file:
<record name="recording" />
[here i record the recording]
<filled>
<submit next="/url/" method="post" namelist="recording"/>
</filled>
In the urls.py of django, I would have
url(r'^url$', view.index, name='index')
The views.index definition
def index(request):
_recording = [..retrieve .wav from request here]
_modelObject = ModelObject(recording= _recording)
_modelObject.save() #store recording in some database
return render(request, 'genericfile.xml', content_type='text/xml')
In the model.py I'd guess I would have a class like:
from django.db import model
class ModelObject(model.Models)
recording = [declare type of .wav file here]
How would I go about completing the steps in the [..] in a clean manner?
I didn't work with vxml before but look like you want to store both .xml format and .wav format.
So here is my solution in this case:
from django.db import model
class ModelObject(model.Models)
# Define a text filed or anything that can store long string
# of _recording var above.
recording = models.TextField()
def save(self, *args, **kwargs):
if self.recording:
# Convert vxml to wav and store to a file
pass
super(ModelObject, self).save(*args, **kwargs)
#property
def recording_wav(self):
if not self.recording:
return None
return 'path/to/file.wav'
Remember use post_delete signal to remove file.wav once an instance of ModelObject is deleted.

Django REST Framework FileField Data in JSON

In Django REST Framework (DRF), how do I support de-Serializing base64 encoded binary data?
I have a model:
class MyModel(Model):
data = models.FileField(...)
and I want to be able to send this data as base64 encoded rather than having to multi-part form data or a "File Upload". Looking at the Parsers, only FileUploadParser and MultiPartParser seem to parse out the files.
I would like to be able to send this data in something like JSON (ie send the binary data in the data rather than the files:
{
'data':'...'
}
I solved it by creating a new Parser:
def get_B64_JSON_Parser(fields):
class Impl(parsers.JSONParser):
media_type = 'application/json+b64'
def parse(self, *args, **kwargs):
ret = super(Impl, self).parse(*args, **kwargs)
for field in fields:
ret[field] = SimpleUploadedFile(name=field, content=ret[field].decode('base64'))
return ret
return Impl
which I then use in the View:
class TestModelViewSet(viewsets.ModelViewSet):
parser_classes = [get_B64_JSON_Parser(('data_file',)),]
This is an old question, but for those looking for an up-to-date solution, there is a plugin for DRF (drf_base64) that handles this situation. It allows reading files encoded as base64 strings in the JSON request.
So given a model like:
class MyModel(Model):
data = models.FileField(...)
and an expected json like:
{
"data": " ....",
...
}
The (des) serialization can be handled just importing from drf_base modules instead of the drf itself.
from drf_base64.serializers import ModelSerializer
from .models import MyModel
class MyModel(ModelSerializer):
class Meta:
model = MyModel
Just remember that is posible to get a base64 encoded file in javascript with the FileReader API.
There's probably something clever you can do at the serialiser level but the first thing that comes to mind is to do it in the view.
Step 1: Write the file. Something like:
fh = open("/path/to/media/folder/fileToSave.ext", "wb")
fh.write(fileData.decode('base64'))
fh.close()
Step 2: Set the file on the model. Something like:
instance = self.get_object()
instance.file_field.name = 'folder/fileToSave.ext' # `file_field` was `data` in your example
instance.save()
Note the absolute path at Step 1 and the path relative to the media folder at Step 2.
This should at least get you going.
Ideally you'd specify this as a serialiser field and get validation and auto-assignment to the model instance for free. But that seems complicated at first glance.

How does one use magic to verify file type in a Django form clean method?

I have written an email form class in Django with a FileField. I want to check the uploaded file for its type via checking its mimetype. Subsequently, I want to limit file types to pdfs, word, and open office documents.
To this end, I have installed python-magic and would like to check file types as follows per the specs for python-magic:
mime = magic.Magic(mime=True)
file_mime_type = mime.from_file('address/of/file.txt')
However, recently uploaded files lack addresses on my server. I also do not know of any method of the mime object akin to "from_file_content" that checks for the mime type given the content of the file.
What is an effective way to use magic to verify file types of uploaded files in Django forms?
Stan described good variant with buffer. Unfortunately the weakness of this method is reading file to the memory. Another option is using temporary stored file:
import tempfile
import magic
with tempfile.NamedTemporaryFile() as tmp:
for chunk in form.cleaned_data['file'].chunks():
tmp.write(chunk)
print(magic.from_file(tmp.name, mime=True))
Also, you might want to check the file size:
if form.cleaned_data['file'].size < ...:
print(magic.from_buffer(form.cleaned_data['file'].read()))
else:
# store to disk (the code above)
Additionally:
Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).
So you might want to handle it like so:
import os
tmp = tempfile.NamedTemporaryFile(delete=False)
try:
for chunk in form.cleaned_data['file'].chunks():
tmp.write(chunk)
print(magic.from_file(tmp.name, mime=True))
finally:
os.unlink(tmp.name)
tmp.close()
Also, you might want to seek(0) after read():
if hasattr(f, 'seek') and callable(f.seek):
f.seek(0)
Where uploaded data is stored
Why no trying something like that in your view :
m = magic.Magic()
m.from_buffer(request.FILES['my_file_field'].read())
Or use request.FILES in place of form.cleaned_data if django.forms.Form is really not an option.
mime = magic.Magic(mime=True)
attachment = form.cleaned_data['attachment']
if hasattr(attachment, 'temporary_file_path'):
# file is temporary on the disk, so we can get full path of it.
mime_type = mime.from_file(attachment.temporary_file_path())
else:
# file is on the memory
mime_type = mime.from_buffer(attachment.read())
Also, you might want to seek(0) after read():
if hasattr(f, 'seek') and callable(f.seek):
f.seek(0)
Example from Django code. Performed for image fields during validation.
You can use django-safe-filefield package to validate that uploaded file extension match it MIME-type.
from safe_filefield.forms import SafeFileField
class MyForm(forms.Form):
attachment = SafeFileField(
allowed_extensions=('xls', 'xlsx', 'csv')
)
In case you're handling a file upload and concerned only about images,
Django will set content_type for you (or rather for itself?):
from django.forms import ModelForm
from django.core.files import File
from django.db import models
class MyPhoto(models.Model):
photo = models.ImageField(upload_to=photo_upload_to, max_length=1000)
class MyForm(ModelForm):
class Meta:
model = MyPhoto
fields = ['photo']
photo = MyPhoto.objects.first()
photo = File(open('1.jpeg', 'rb'))
form = MyForm(files={'photo': photo})
if form.is_valid():
print(form.instance.photo.file.content_type)
It doesn't rely on content type provided by the user. But
django.db.models.fields.files.FieldFile.file is an undocumented
property.
Actually, initially content_type is set from the request, but when
the form gets validated, the value is updated.
Regarding non-images, doing request.FILES['name'].read() seems okay to me.
First, that's what Django does. Second, files larger than 2.5 Mb by default
are stored on a disk. So let me point you at the other answer
here.
For the curious, here's the stack trace that leads to updating
content_type:
django.forms.forms.BaseForm.is_valid: self.errors
django.forms.forms.BaseForm.errors: self.full_clean()
django.forms.forms.BaseForm.full_clean: self._clean_fields()
django.forms.forms.BaseForm._clean_fiels: field.clean()
django.forms.fields.FileField.clean: super().clean()
django.forms.fields.Field.clean: self.to_python()
django.forms.fields.ImageField.to_python