In my compojure app, where should I store user upload files? Do I just make a user-upload dir in my project root and stick everything in there? Is there anything special I should do (classpath, permissions, etc)?
To properly answer your question, you need to think of the lifecycle of the uploaded files. I would start answering questions such as:
how big are the files going to be?
what storage options will hold enough data to store all the uploads?
how about SLAs, redundancy and disaster avoidance?
how and who to monitor the free space and health of the storage?
In general, the file system location is much less relevant than the block device sitting behind it: as long as your data is stored safely enough for your application, user-upload can be anywhere and be anything from a regular disk to an S3 bucket e.g. via s3fs-fuse.
Putting such folder in your classpath sounds odd to me. It gives no essential benefit, as you will always need to go through a configuration entry to state where to store and read files from.
Permission wise, your application will require at least write access to the upload storage (most likely read access as well). Granting such permissions depends on the physical device you choose: if you opt for the local file system as you suggest in your question, you need to make sure the Clojure app is run by a user with chmod +rw, but in case of S3, you will need to configure API keys.
For anything other than a practice problem, I would suggest using a database such as Postgres or Datomic. This way, you get the reliability of a DB with real transactions, along with the ability to access the files across a network from any location.
Related
Wanted to know if cloud based platforms such as Azure and Amazon zeroize the content on the hard disk whenever an 'instance' is 'deleted' and prior to making it available for other users?
I've tried using 'dd' command on an Amazon-LightSail instance and it appears that the raw data is indeed zeroized. However was not sure if it was by chance (i just tried few random lengths) or if they actually take care to do that.
The concern is, if I leave passwords in configuration files, then someone who comes along would be able to read them (theoretically). Same goes for data in a database.
Generically, the solution to your concern typically used by Azure is storage encryption.
Your data is encrypted by default at the platform level with a key specific to your subscription; when the data or resource is removed, whether or not the storage is zeroed, it is effective inaccessible to a resource deployed on the same storage in another subscription.
I know about s3 storage but I am trying to see if things can work out by only using filesystem
The main reason to use a service like S3 is scalability. Imagine that you use a the file system of a simple server to store files. Then it means that everyone that visits your site and wants to access a file, has to visit the same server. If there are enough visitors, then this will eventually render the system unresponsive.
Scalable storage services will store the same data on multiple servers to allow serving the content when the number of requests increases. Furthermore one normally hits a server that is close to the location of that user which minimizes the delay to fetch a file.
Finally such storage services are more reliable. If you use a single disk to store all the files, it is possible that eventually the disk fails losing all the data. By storing the data on multiple locations, it is less likely that the files are completely lost.
I am new to Django so I apologize if I missed something. I would like to have a library that gives me a single-instance data store for Blob / Binary data. I want a library that masks whether or not the files are stored in the database, file system or some kind of back end like S3 on Amazon. I want a single API that lets me add files, and get back URLs to serve those files. Also it would be nice if the implementation supported some kind of migration if I had blobs in a database for a site when it just started out and then move those blobs to an S3 bucket behind the scenes without me needing to change how my application stores and serves the data.
An important sub-aspect of this is that the files have to be only shown to properly authorized users (i.e. just putting them in an open /media/ folder as files is not sufficient).
Perhaps I am asking too much - but I find this kind of service very useful in my applications. The main reason that I am asking is that unless I find such a thing - I will wander off and build my own library - I just don't want to waste the time if this kind of thing already exists.
In a project we will create configuration file for each clients(Also can be sqlite in each clients instead of a configuration file). That files will include critical information like policies. Therefore end-user musn't add, delete, change that configuration file or something in the file.
I am considering to use active directory to prevent users to open folder that include my configuration file.
Is there a standart way to use secure configuration files?
EDIT:
Of course speed of reading the file is important as security
EDIT2:
I can't do that with a DB server because my policies must be accesible whithout internet connection too. A server will update that file or sqlite tables in some periods. And I am using c++.
I'm sorry to crush your hopes and dreams, but if your security is based on that configuration file on the client, you're screwed.
The configuration file is loaded and decrypted by your application and this means that values can be changed using special tools when the application runs.
If security is important, do those checks on the server, not the client.
Security is a fairly broad matter. What happens if your system is compromised? Does someone lose money? Does someone get extra points in a game? Does someone gain access to nuclear missile-launching codes? Does someones medical data get exposed to the public?
All of these are more or less important security concerns, but as you can imagine, nuclear missile-launching has higher requirements to be completely secure than some game where someone may boost their score, and money and health obviously end up somewhere in the middle of that range, with lots of other things that we could add to the list.
It also matters what type of "users" you are trying to protect against. Is it national level security experts (e.g. FBI, CIA, KGB, etc), hobby hackers, or just normal computer users? Encrypting the file will stop a regular user, and perhaps a hobby hacker, but national security experts certainly won't be foiled by that.
Ultimately, tho', if the machine holding the data also knows how to read the data, then you can not have a completely secure system. The system can be bypassed by reading the key in the code, and re-implementing any de-/encryption etc that is part of your "security". And once the data is in plain text, it can be modified and then re-encrypted and stored back.
You can of course make it more convoluted, which will mean that someone will have to have a stronger motive to work their way through your convoluted methods, but in the end, it comes down to "If the machine knows how to decrypt something, someone with access to the machine can decrypt the content".
It is up to you (and obviously your "customers" and/or "partners" whose data you are looking after), whether you think that is something you can risk or not.
I have an application based on php in one amazon instance for uploading and transcoding audio files. This application first uploads the file and after that transcodes that and finally put it in one s3 bucket. At the moment application shows the progress of file uploading and transcoding based on repeatedly ajax requests by monitoring file size in a temporary folder.
I was wondering all the time if tomorrow users rush to my service and I need to scale my service with any possible way in AWS.
A: What will happen for my upload and transcoding technique?
B: If I add more instances does it mean I have different files on different temporary conversion folders in different physical places?
C: If I want to get the file size by ajax from http://www.example.com/filesize up to the finishing process do I need to have the real address of each ec2 instance (i mean ip,dns) or all of the instances folders (or folder)?
D: When we scale what will happen for temporary folder is it correct that all of instances except their lamp stack locate to one root folder of main instance?
I have some basic information about scaling in the other hosting techniques but in amazon these questions are in my mind.
Thanks for advice.
It is difficult to answer your questions without knowing considerably more about your application architecture, but given that you're using temporary files, here's a guess:
Your ability to scale depends entirely on your architecture, and of course having a wallet deep enough to pay.
Yes. If you're generating temporary files on individual machines, they won't be stored in a shared place the way you currently describe it.
Yes. You need some way to know where the files are stored. You might be able to get around this with an ELB stickiness policy (i.e. traffic through the ELB gets routed to the same instances), but they are kind of a pain and won't necessarily solve your problem.
Not quite sure what the question is here.
As it sounds like you're in the early days of your application, give this tutorial and this tutorial a peek. The first one describes a thumbnailing service built on Amazon SQS, the second a video processing one. They'll help you design with best AWS practices in mind, and help you avoid many of the issues you're worried about now.
One way you could get around scaling and session stickiness is to have the transcoding update a database with the current progress. Any user returning checks the database to see the progress of their upload. No need to keep track of where the transcoding is taking place since the progress gets stored in a single place.
However, like Christopher said, we don't really know anything about you're application, any advice we give is really looking from the outside in and we don't have a good idea about what would be the easiest thing for you to do. This seems like a pretty simple solution but I could be missing something because I don't know anything about your application or architecture.