How to delete hundreds of documents to a Django Web App? - django

I developed a Django application and i have to process 500 documents (I have to upload them using a .zip/.rar and to process them one by one using NLP). The problem is that when i try to upload all the 500 documents my app takes so much time, around 2 hours. I would like to know what is the best way to upload these documents using the Django Framework? How can i upload them one by one so i will not have time out problems or too big file errors?
P.D The user wants to upload the 500 documents compressed in .zip/.rar file. He wants to upload this file once so the system has to process the 500 documents once. So i have to find a way to upload them without overloading my Web App. I tryied to upload them once, but the server process takes so much time uploading 500 documents and i can find http time out errors or too big entity errors.

Anyone answered so i will answer myself. I used Celery + Django. Celery let you create asynchronous tasks so you can create a process that is independent from your web app and it can work in the background. You can use RabbitMQ or Redis to send messages between your independent process and your application. Just search on Google : Celery + Django

Related

What is the best way to handle uploaded images in Django Rest Framework on production server and how to do it?

I'm relatively new to DRF, and I'm facing a problem where I have to set up many (thousands) of profile images. I need to do it "properly", so I guess uploading raw photos to the server is not the best way to do it. What service/ approach should I use?

How to run multiple instances of a Django app?

This question doesn't involve any code. I just want to know a way to run multiple instances of a django app and if it is really possible in the first place.
I make django apps and host them on Apache. With the apps, I noticed a conflict between multiple users accessing the web app.
Let us assume it is a web scraping app. If one user visits the app and run the scraper, the other user accessing the site from a different location doesn't seem to be able to visit the app or run the scraper unless the scraping that the first user started finishes.
Is it really possibe to make it independent for all different users accessing the app?
There are a few ways you could approach this. You might consider putting your app into a container (Google search: docker {your stack})
Then implement something like Docker Swarm or Kubernetes to allocate multiple instances of your app.
That being said, you might consider how to refactor your app to allow multiple users. It sounds like your scraping process locks things. But in reality, there's no reason your server should lock up during this.
It might be better to create your app so that when it received a request, like someone visiting the site, the server pays out the requested web page. When a user asks for a scrape/task to run, the server calls your scaper service or script asynchronously.
That way your app can still function while a scrape is in progress. This would be MUCH more resource efficient (and likely simpler) than spinning up tens or hundreds of instances of your entire app.
tl;Dr: containerization for multiple instances
Refactor app so a single user can't threadlock it.

Allow large file upload from browser while navigating to other page

I'm building a website with Django 1.11 with a fairly simple javascript/html/css part (no framework like Vuejs). I have page reload on each navigation which is fine for my use case.
For convenience, I serve my website from App Engine Standard and it's going well so far. Now, I need my user to be able to upload files (up to 300MB size). Due to App Engine's limitation on request size (32MB), I'm using signed urls so I can send these files directly from my client's Javascript to Cloud Storage.
Due to the size of the files, the upload may take some time, but I can't seem to navigate to another page since it may cancel the upload. I understand that for a case like this a client app like single-page app in Vuejs for example would be appropriate but is there a way to achieve this with my current setup without rewriting my whole website (with possibly Vuejs and Django REST API)?
Any suggestions would be much appreciated.

heroku django Amazon S3. video recording and playing

everybody!
I have django app running on heroku server. I attached amazon S3 storage to it.
I want to allow users record webcam videos, then upload them (to S3 ultimately) and then other users to play them.
What is the easiest way to do that?
I have alteady spent more than 20 hours on research around this topic but I still have no idea.
People usualy uses some streaming servers like RED5 + flash players + something + something.. But it seems to be very compicated and is not appropriate for heroku as I understood...
I would appreciate any help!
The solution that most people go with is to use a flash applet to record into memory, and then bulk upload it at the end. Nimbb seems to be the one most people go with.
Alternatively, Youtube has a pretty neat API for doing recording and uploading to their servers entirely from within your site.
The YouTube Upload Widget lets your website's visitors perform both webcam and file uploads to YouTube. The support for webcam uploads sets the upload widget apart from the other uploading options that the YouTube API supports. The widget uses HTML5's postMessage support to send messages back to your website regarding videos uploaded via the widget.

How to log SQL requests posted by a django application?

I have a quite slow admin interface in a django application, the application is supported by apache2 and PostgreSQL.
I suspect the problem to be unoptimized sql request but I cannot understand which one. I believe that a request is sent for every row instead of a request for all row.
Is it possible to log every sql requests actually sent to my database?
Thank for your advice.
Use the log_min_duration option in the configuration file:
http://www.postgresql.org/docs/current/static/runtime-config-logging.html#GUC-LOG-MIN-DURATION-STATEMENT
You might also want to install the auto-explain module which will also dump the execution plan of the slow queries to the log file.
More details here: http://www.postgresql.org/docs/current/static/auto-explain.html