Best back end file upload parallelism technique - django

In my web application (Django), I am sending multiple files to the backend. I need to implement a parallel file upload technique which will make use of maximum CPU core capacity. I am ready to implement it it any language or any tool.
Please someone suggest best tools or techniques to implement this, so that file saving to backend server (to hard disk or database) should be completed in much less time compared to normal file upload.

I don't think it is about making upload parallel in the back-end, and you might think of doing something on front-end, where you take your files split into chunks and send chunks to back-end by ajax. HTML5 allows you to split the file on front-end and send chunk-by-chunk to server. So you can try using some of JS plugins which will help to achieve this

Related

how to upload django files as background? Or how to increased file upload?

I need to increased file upload speed in Django. Any ways to do this? I guess about to upload files as a background, when user send POST request i just redirect them to some page and start uploading files. Any ways to do this? Or do you know any ways to increased upload speed? Thank you in advance
Low upload speed could be a result of several issues.
It is a normal situation and your client doesn't have a possibility to upload at a higher speed.
Your server instance using an old HDD and can't write quickly.
Your server works on another pool of requests and serves your clients as fast as it could but it is overloaded.
Your instance doesn't have free space on the hard drive
Your server redirects the file as a stream somewhere else.
You've written a not optimized code of the upload handler.
etc.
You don't use a proxy-server that works perfectly with slow clients and when a file is on the proxy server's side give it to Django in a moment.
You are trying to upload a very big file.
etc.
Maybe you have more details on how you handle the uploads and your environment.

Uploading large files to server

The project I'm working on logs data on distributed devices that needs to be joined in a single database on a remote server.
The logs cannot be streamed as they are recorded (network may not be available etc) so they must be sent in bulky 0.5-1GB text based csv files occasionally.
As far as I understand this means having a web service receive the data in form of post requests is out of the question because of file sizes.
So far I've come up with this approach: Use some file transfer protocol (ftp or similar) to upload files from device to server. Devices would have to figure out a unique filename to do this with. Have the server periodically check for new files, process them by committing them to the database and deleting them afterwards.
It seems like a very naive way to go about it, but simple to implement.
However, I want to avoid any pitfalls before I implement any specifics. Is this approach scaleable (more devices, larger files)? Implementation will either be done using a private/company owned server or a cloud service (Azure for instance) - will it work for different platforms?
You could actually do this through web/http as well, after setting a higher value for post request in the web server (post_max_size andupload_max_filesize for PHP). This will allow devices to interact regardless of platform. Should't be too hard to make a POST request server from any device. A simple cURL request could get this job done.
FTP is also possible. Or SCP, to make it safer.
Either way, I think this does need some application on the server to be able to fetch and manage these files using a database. Perhaps a small web application? ;)
As for the unique name, you could use a combination of the device's unique ID/name along with current unix time. You could even hash this (md5/sh1) afterwards if you like.

HOw to compress files in the shopify to improve web speed

My website is on shopify platform. Google test show following message
Compressing resources with gzip or deflate can reduce the number of bytes sent over the network.
Enable compression for the following resources to reduce their transfer size by 1.6MiB (78% reduction).
Compressing https://sdk.azureedge.net/js/1.b... could save 1.6MiB (78% reduction).
How can I compress these files in shopify?
You can speed up your Google Speed test by compressing your websites (.js) file
Google itself provides its Closure Compiler for this purpose.
refer Closure Compiler
P.S. Take backup of your JS file before updating in your store js because its compressed code cannot be rolled back to its original state after being compiled.
If you're on Shopify, most likely those PageSpeed warnings are coming from 3rd party apps.
In that case you can just reach out to them and ask if they will compress the file.
If you don't know which app it is, you can go to the domain name or try searching "what is [paste file path here]".
For Shopify specifically, you can get more details and a template from this post.

Efficient way to transfer data from one django application to another

Currently, I'm working on a project where I have a server - client relationship between two django applications running on separate hosts.
The server has to store and provide a large amount of relational data, eg: Suppliers, Companys, Products, etc etc..
The client downloads data on request from the server and adds it to their database. clients can also upload from their station to the database to expand it.
The previous person that developed this used XMLRPC to transfer the vast (13MB typical) XML file from server to client. now really all we're sending are database agnostic objects to be stored in a database so i wondered if there was a more efficient way of doing it?
Please ask for more details if you need them, I wasn't really sure what you'd need to know
EDIT: Efficient in terms of Networking, and Server Side Processing. Clients can do the heavy lifting.
A shared database design seems more suitable. But of course there may be security, political or organisational reasons ruling that out. Plus there would be significant re-design required.
To reduce network bandwidth first check that HTTP gzip compression is enabled.
If it's just a dumb data transfer JSON would generally be a lot more compact than XMLRPC. Does the data look amenable to a straight translation to JSON? This would still require some server-side processing.
For minimal server-side processing (if the database tables are relatively similar) it may be very efficient to just send the client a dump of the relevant db query. Of course unless the tables have the same schema you would have to do some client-side processing of raw SQL, which is not ideal.

sftp versus SOAP call for file transfer

I have to transfer some files to a third party. We can invent the file format, but want to keep it simple, like CSV. These won't be big files - a few 10s of MB at most and there won't be many - 3 files per night.
Our preference for the protocol is sftp. We've done this lots in the past and we understand it well.
Their preference is to do it via a web service/SOAP/https call.
The reasons they give is reliability, mainly around knowing that they've fully received the file.
I don't buy this as a killer argument. You can easily build something into your file transfer process using sftp to make sure the transfer has completed, e.g. use headers/footers in the files, or move file between directories, etc.
The only other argument I can think of is that over http(s), ports 80/443 will be open, so there might be less firewall work for our infrastructure guys.
Can you think of any other arguments either way on this? Is there a consensus on what would be best practice here?
Thanks in advance.
File completeness is a common issue in "managed file transfer". If you went for a compromise "best practice", you'd end up running either AS/2 (a web service-ish way to transfer files that incorporates non-repudiation via signed integrity checks) or AS/3 (same thing over FTP or FTPS).
One of the problems with file integrity and SFTP is that you can't arbitrarily extend the protocol like you can FTP and FTPS. In other words, you can't add an XSHA1 command to your SFTP transfer just because you want to.
Yes, there are other workarounds (like transactional files that contain hashes of files received), but at the end of the day someone's going to have to do some work...but it really shouldn't be this hard.
If the third party you're talking to really doesn't have a non-web service call to accept large files, you might be their guinea pig as they try to navigate a brand new world. (Or, they may have jsut fired all their transmissions folks and are not just realizing that the world doesn't operate on SOAP...yet - seen that happen too.)
Either way, unless they GIVE you the magic code/utility/whatever to do the file-to-SOAP transaction for them (and that happens too), I'd stick to your sftp guns until they find the right guy on their end to talk bulk data transmissions.
SFTP is the protocol for secure file transfer, soap is an API protocol - which can be used for sending file attachments (i.e. MIME attachments), or as Base64 encoded data.
SFTP adds additional potential complexity around separate processes for encrypting/decrypting files (at-rest, if they contain sensitive data), file archiving, data latency, coordinating job scheduling, and setting-up FTP service accounts.