Simply Import Task with Celery - django

I have a simple view which uploads CSV data to a mapped model and populates the data. This works perfect, but now I want to integrate Celery and I'm really struggling to get the following task to work. I'm trying Celery with Django and Amazon SQS.
This is the main part of my view.py which runs the task:
def upload(request):
# If we had a POST then get the request post values.
if request.method == 'POST':
form = ContactUploadForm(request.POST, request.FILES)
# Check we have valid data
if form.is_valid():
filename = handle_uploaded_file(request.FILES['file'])
import_csv.delay(filename)
def handle_uploaded_file(f):
with open('name.csv', 'wb+') as destination:
for chunk in f.chunks():
destination.write(chunk)
This was my 1st attempt at the task.py
#task
def import_csv(filename):
ContactCSVModel.import_from_file(filename)
Which gives the error in the celery log: AttributeError: 'NoneType' object has no attribute 'seek'
My second attempt I think won't work because it actually trying to upload the file to SQS and gives SQSError: 413 Request Entity Too Large. I'm assuming this is not what I want to do at all, its a task and I don't want to upload the file to SQS.
2nd attempt at task.py
#task
def import_csv(filename):
ContactCSVModel.import_data(data = open(filename))
3rd attempted at task.py by passing in the request instead
#task
def import_csv(request):
filename = handle_uploaded_file(request.FILES['file'])
ContactCSVModel.import_data(data = open(filename))
This give the error **Can't pickle <type 'cStringIO.StringO'>: attribute lookup cStringIO.StringO failed**
How can I achieve this task? I'm sure it's something very simple :) As you can see I have tried a few different things above to create this task.

Following this example: http://codeinthehole.com/writing/use-models-for-uploads/
Create a new model to handle the file upload and use celery to run the import, this way the task is just the job id
#task
def process_upload(upload_id):
upload = Uploads.objects.get(id=upload_id)
upload.process()

Related

Dajngo CSV FIle not download ? When we have a large CSV file download its takes some time?Django 502 bad gateway nginx error Django

How can I download a large CSV file that shows me a 502 bad gateway error?
I get this solution I added in below.
Actually, in this, we use streaming references. In this concept for example we download a movie it's will download in the browser and show status when complete this will give the option to show in a folder same as that CSV file download completely this will show us.
There is one solution for resolving this error to increase nginx time but this is will affect cost so better way to use Django streaming. streaming is like an example when we add a movie for download it's downloading on the browser. This concept is used in Django streaming.
Write View for this in Django.
views.py
from django.http import StreamingHttpResponse
503_ERROR = 'something went wrong.'
DASHBOARD_URL = 'path'
def get_headers():
return ['field1', 'field2', 'field3']
def get_data(item):
return {
'field1': item.field1,
'field2': item.field2,
'field3': item.field3,
}
class CSVBuffer(object):
def write(self, value):
return value
class Streaming_CSV(generic.View):
model = Model_name
def get(self, request, *args, **kwargs):
try:
queryset = self.model.objects.filter(is_draft=False)
response = StreamingHttpResponse(streaming_content=(iter_items(queryset, CSVBuffer())), content_type='text/csv', )
file_name = 'Experience_data_%s' % (str(datetime.datetime.now()))
response['Content-Disposition'] = 'attachment;filename=%s.csv' % (file_name)
except Exception as e:
print(e)
messages.error(request, ERROR_503)
return redirect(DASHBOARD_URL)
return response
urls.py
path('streaming-csv/',views.Streaming_CSV.as_view(),name = 'streaming-csv')
For reference use the below links.
https://docs.djangoproject.com/en/4.0/howto/outputting-csv/#streaming-large-csv-files
GIT.
https://gist.github.com/niuware/ba19bbc0169039e89326e1599dba3a87
GIT
Adding rows manually to StreamingHttpResponse (Django)

How to upload and process large excel files using Celery in Django?

I am trying to upload and process excel file using Django and DRF with Celery.
There is an issue when I am trying to pass the file to my Celery task to be processed in the background, I get a following error:
kombu.exceptions.EncodeError: Object of type InMemoryUploadedFile is not JSON serializable
Here is my view post request handler:
class FileUploadView(generics.CreateAPIView):
"""
POST: upload file to save data in the database
"""
parser_classes = [MultiPartParser]
serializer_class = FileSerializerXLSX
def post(self, request, format=None):
"""
Allows to upload file and lets it be handled by pandas
"""
serialized = FileSerializerXLSX(data=request.data)
if serialized.is_valid():
file_obj = request.data['file']
# file_bytes = file_obj.read()
print(file_obj)
import_excel_task.delay(file_obj)
print("its working")
return Response(status=204)
return Response(serialized._errors, status=status.HTTP_400_BAD_REQUEST)
And my celery task:
def import_excel_helper(file_obj):
df = extract_excel_to_dataframe(file_obj)
transform_df_to_clientmodel(df)
transform_df_to_productmodel(df)
transform_df_to_salesmodel(df)
#shared_task(name="import_excel_task")
def import_excel_task(file_obj):
"""Save excel file in the background"""
logger.info("Importing excel file")
import_excel_helper(file_obj)
Any idea what is the way to handle importing Excel files into celery task so that it can be processed by other functions in the background?
As in the error, the body of the request to call a celery task must be JSON serializable since it is the default configuration. Then as documented in kombu:
The primary disadvantage to JSON is that it limits you to the following data types: strings, Unicode, floats, boolean, dictionaries, and lists. Decimals and dates are notably missing.
Let's say this is my excel file.
file.xlsx
Some
Value
Here
:)
Solution 1
Convert the raw bytes of the excel into Base64 string before calling the task so that it can be JSON serialized (since strings are valid data types in a JSON document, raw bytes are not). Then, everything else in the Celery configurations are the same default values.
tasks.py
import base64
import pandas
from celery import Celery
app = Celery('tasks')
#app.task
def add(excel_file_base64):
excel_file = base64.b64decode(excel_file_base64)
df = pandas.read_excel(excel_file)
print("Contents of excel file:", df)
views.py
import base64
from tasks import add
with open("file.xlsx", 'rb') as file: # Change this to be your <request.data['file']>
excel_raw_bytes = file.read()
excel_base64 = base64.b64encode(excel_raw_bytes).decode()
add.apply_async((excel_base64,))
Output
[2021-08-19 20:40:28,904: INFO/MainProcess] Task tasks.add[d5373444-485d-4c50-8695-be2e68ef1c67] received
[2021-08-19 20:40:29,094: WARNING/ForkPoolWorker-4] Contents of excel file:
[2021-08-19 20:40:29,094: WARNING/ForkPoolWorker-4]
[2021-08-19 20:40:29,099: WARNING/ForkPoolWorker-4] Some Value
0 Here :)
[2021-08-19 20:40:29,099: WARNING/ForkPoolWorker-4]
[2021-08-19 20:40:29,099: INFO/ForkPoolWorker-4] Task tasks.add[d5373444-485d-4c50-8695-be2e68ef1c67] succeeded in 0.19386404199940444s: None
Solution 2:
This is the harder way. Implement a custom serializer that will handle excel files.
tasks.py
import ast
import base64
import pandas
from celery import Celery
from kombu.serialization import register
def my_custom_excel_encoder(obj):
"""Uncomment this block if you intend to pass it as a Base64 string:
file_base64 = base64.b64encode(obj[0][0]).decode()
obj = list(obj)
obj[0] = [file_base64]
"""
return str(obj)
def my_custom_excel_decoder(obj):
obj = ast.literal_eval(obj)
"""Uncomment this block if you passed it as a Base64 string (as commented above in the encoder):
obj[0][0] = base64.b64decode(obj[0][0])
"""
return obj
register(
'my_custom_excel',
my_custom_excel_encoder,
my_custom_excel_decoder,
content_type='application/x-my-custom-excel',
content_encoding='utf-8',
)
app = Celery('tasks')
app.conf.update(
accept_content=['json', 'my_custom_excel'],
)
#app.task
def add(excel_file):
df = pandas.read_excel(excel_file)
print("Contents of excel file:", df)
views.py
from tasks import add
with open("file.xlsx", 'rb') as excel_file: # Change this to be your <request.data['file']>
excel_raw_bytes = excel_file.read()
add.apply_async((excel_raw_bytes,), serializer='my_custom_excel')
Output
Same as Solution 1
Solution 3
You might be interested with this documentation of Sending raw data without Serialization

How to zip multiple uploaded file in Django before saving it to database?

I am trying to compress a folder before saving it to database/file storage system using Django. For this task I am using ZipFile library. Here is the code of view.py:
class BasicUploadView(View):
def get(self, request):
file_list = file_information.objects.all()
return render(self.request, 'fileupload_app/basic_upload/index.html',{'files':file_list})
def post(self, request):
zipfile = ZipFile('test.zip','w')
if request.method == "POST":
for upload_file in request.FILES.getlist('file'): ## index.html name
zipfile.write(io.BytesIO(upload_file))
fs = FileSystemStorage()
content = fs.save(upload_file.name,upload_file)
data = {'name':fs.get_available_name(content), 'url':fs.url(content)}
zipfile.close()
return JsonResponse(data)
But I am getting the following error:
TypeError: a bytes-like object is required, not 'InMemoryUploadedFile'
Is there any solution for this problem? Since I may have to upload folder with large files, do I have to write a custom TemporaryFileUploadHandler for this purpose? I have recently started working with Django and it is quite new to me. Please help me with some advice.
InMemoryUploadedFile is an object that contains more than just file you should open file and read it content ( InMemoryUploadedFile.file is the file)
InMemoryUploadedFile.open()
You should open file with open() and then read() it's content, also you should check if you have uploaded files correctly also you could use with syntax for both zip and file
https://www.pythonforbeginners.com/files/with-statement-in-python

Django: How to attach a file without referencing a file destination?

I am looking to attach a file to an email which includes all the content a user inputs from a contact form. I currently refer a PDF which records their inputs, and I attach that PDF from a file destination. However, I do not know how to attach additional files which the user provides on the contact form. In this case, this is represented by "msg.attach_file(upload_file)." My thoughts are:
Have the file be uploaded to a destination; however, it needs to renamed to a uniform name each time so I can refer to it during the attachment process (msg.attach_file).
Figure out a way to use request.FILES to attach it immediately without having to worry about its file name or upload destination (I am not sure if msg.attach_file is a valid command for this method).
Is there a right way to perform this action? I am attempting to perform method 2 with my views.py file which refers to my forms.py file, but it is giving me an error.
Views.py
def quote_req(request):
submitted = False
if request.method == 'POST':
form = QuoteForm(request.POST, request.FILES)
company = request.POST['company']
contact_person = request.POST['contact_person']
upload_file = request.FILES['upload_file']
description = 'You have received a sales contact form'
if form.is_valid():
data_dict = {
'company_': str(company),
'contact_person_': str(contact_person),
}
write_fillable_pdf(INVOICE_TEMPLATE_PATH, INVOICE_OUTPUT_PATH, data_dict)
form.save()
# assert false
msg = EmailMessage('Contact Form', description, settings.EMAIL_HOST_USER, ['sample#mail.com'])
msg.attach_file('/uploads/file.pdf')
msg.attach_file(upload_file)
msg.send(fail_silently=False)
return HttpResponseRedirect('/quote/?submitted=True')
else:
form = QuoteForm()
if 'submitted' in request.GET:
submitted = True
Error Log
TypeError at /quote/
expected str, bytes or os.PathLike object, not InMemoryUploadedFile
Request Method: POST
Request URL: http://www.mytestingwebsitesample.com/quote/
Django Version: 2.1.3
Exception Type: TypeError
Exception Value:
expected str, bytes or os.PathLike object, not InMemoryUploadedFile
Can you try the following? Since InMemoryUploadedFile doesn't work, might have to process it first
upload_file = request.FILES['upload_file']
content = upload_file.read()
attachment = (upload_file.name, content, 'application/pdf')
# . . .
msg.attach(attachment)
upload_file.read() will return bytes. You might want to try using attach instead of attach_file. attach_file requires the file to be saved to your filesystem, while attach can take data. However, I believe that with attach, you should be able to use request.FILES['upload_file'] directly.
https://docs.djangoproject.com/en/2.2/topics/email/#emailmessage-objects
I have resolved my issue by employing a storage.py file that overwrites files with the same name; in my case, I am uploading each file, renaming it to a uniform name, and then having the storage file overwrite it later on rather than Django adding an extension to a file name with the same title.

Flask custom error handling for upload function

I have developed an upload form to take specific .xlsx file as upload. The requirement is to handle any exceptions for upload of non xlsx (for e.g. zip, exe file). I am using pyexcel library for reading the upload. I tried creating following code to handle this exception:
enter image description here
enter image description here
The error handling code is as follows:
class FILE_TYPE_NOT_SUPPORTED_FMT(Exception):
pass
#app.errorhandler(FILE_TYPE_NOT_SUPPORTED_FMT)
def custom_handler(errrors):
app.logger.error('Unhandled Exception: %s', (errrors))
return render_template('400.html'), 400
and the upload code is as follows:
#users.route("/oisdate_upload", methods=['GET', 'POST'])
#login_required
def doimport_ois_date():
msg=None
if request.method == 'POST':
def OIS_date_init_func(row):
#c.id = row['id']
c = Ois_date(row['date'],row['on'],row['m1'],row['m2'],row['m3'],row['m6'],row['m9'],row['y1'],row['y2'],row['y3'],row['y4'],row['y5'],row['y7'],row['y10'])
return c
request.save_book_to_database(
field_name='file', session=db.session,
tables=[Ois_date],
initializers=[OIS_date_init_func])
msg = "Successfully uploaded"
#return redirect(url_for('users.doimport_ois_date'), code=302)
if((Ois_date.query.order_by(Ois_date.date.desc()).first()) is not None):
date_query = Ois_date.query.order_by(Ois_date.date.desc()).first()
start_date = date_query.date
date_query1 = Ois_date.query.order_by(Ois_date.date.asc()).first()
end_date = date_query1.date
return render_template('OISdate_upload.html',msg=msg, start_date=start_date,end_date=end_date)
I am unable to figure out how to correctly capture the error and handle it, any feedback would be appreciated.
You have two options to handle this exception.
1) you import the exception directly from the pyexcel package and use it as the error:
e.g.
from pyexcel.exceptions import FileTypeNotSupported
...
#app.errorhandler(FileTypeNotSupported)
...
2) Or, you can wrap the code where you want to load the spreadsheet in a try-except block and throw a custom error.
from pyexcel.exceptions import FileTypeNotSupported
class CustomError(Exception)
pass
#app.errorhandler(CustomError)
# do something
pass
#app.route('/upload_excel')
def upload_excel():
try:
function_where_you_load_excel()
except FileTypeNotSupported:
raise CustomError