Passing a serialized object through a URL - django

I am serializing/pickling an object, encoding it as a compressed string, and passing it as a parameter in the URL for the next page to deserialize. My web app does not have a database; I am doing this because the app gets data from external web services, which are slow.
Is this acceptable practice? Is this a security risk? Is there a way to make this secure?

If you need to share data between views, do it with the session. That's what sessions are made for. Session info is stored in the database by default, but it doesn't have to be, you can also use the filesystem, some caching system (memcache, Redis, etc), or signed-cookies (Django 1.4+ only).
See:
Configuring the Session Engine
How to Use Sessions

Is this a security risk?
If the serialisation you are using is pickle then yes that is definitely a problem, as alluded to on the doc:
Never unpickle data received from an untrusted or unauthenticated source
Use a form of serialisation designed only to hold safe static values (eg JSON).
You can protect a value that you send to the client side from tampering by signing it with a MAC, eg using hmac. You may need to consider adding other properties to the MAC-signed data such as username or timestamp, to prevent signed data blocks being freely interchangeable, if that's a threat to whatever integrity you are trying to achieve.
If you also need to protect the value from being viewed and interpreted by the client side user you would need to use an encryption algorithm (eg AES - not part of stdlib) in addition to the signing.
(I still wouldn't personally trust a MAC-signed and encrypted pickle. Even though it would need the server-side secret to be leaked to make it exploitable, you don't really want an information-leakage vulnerability to escalate to an arbitrary-code-execution vulnerability, which is what pickle represents.)

It is not the best option, since URL parameter fields will show in server logs. You're
probably better of sending data with POST method or better yet, creating a rudimentary database (if you don't have access to anything else, use Sqlite) and just pass the ID to the next screen.

Related

Why storing result of getSignedUrl in database is bad idea?

I have started to work on existing project where result of getSignedUrl plus expiration time are being persisted in database. My intuition tells me that this is bad and wrong but I can't provide clear explanation why and what is better alternative? One of the factors why I think this approach is bad is that it requires data modification on read only data queries(ie if url has expired it has to be updated in database). Another reason - this feels for me like storage of computed value which is not even that expensive to compute and in case this have to be optimized I guess there should be additional cache mechanism which handles this instead of database. Is my reasoning correct? How can I provide better explanation to my colleagues?
This will really depend on the usecase as to why its being stored in the database, however as you point out it is adding extra latency for the user as presumably you would be querying the database or retrieving from the cache static images.
Whilst signed URLs can be re-used and there is nothing wrong with this, if every asset is using its own signed URL this will add to the maintenance of the application. There are two approaches that could do this:
A scripted action constantly recycling through assets to repopulate the data store of signed URLs.
Writing to the database to retrieve in the user workflow if the link has expired.
Needless to say none of these are ideal, if this is primarily frontend assets I would recommend looking at whether you can add CloudFront in front of your origin and make use of signed cookies instead.
This provides similar functionality to a signed URL, but instead have the application generate a cookie for the user that will then grant the user access to the assets without needing to either generate a new signed URL every time or look up in a database. Both of those options add to the user performance and experience.
If the signed URLs are for reports or generated content, I believe these should be generated when a user requests it (although this is my opinion). This will make it easier to look back through any auditing to determine when the action was required and will need a user to authenticate again once the URL has expired.
Also consider if the signed URL resolves to sensitive content and it is stored plainly in a database that your developers have access to, then they would be able to retrieve this information.

should i store user specific information for other entity in my database?

I want to access http APIs of a server from lambda function. But this http api expect some encrypted credential information ( lets say token) in headers. For every user, there is unique token generated. Should i store this token for each user using some db like dynamoDB.
Is it a good practice?
or, Is there any service which can be used for user management
or else, what can i do
Well, you have to store it somewhere, and I guess you will probably need to load user information anyway so you might as well store it with that, where-ever that might be.
Of course, it is important that you keep all your user information secure and keeping it all in one place, including this access token, makes it easier to keep it secure, as opposed to storing it separately from your other user info.
And yes DynamoDB is a very common choice for storing user info on AWS. Slower then SQL options, but still a good and common choice.

Using Django sessions to store logged in user

I'm creating a REST-centric application that will use a NoSQL data store of some kind for most of the domain-specific models. For the primary site that I intend to build around the REST data framework, I still want to use a traditional relational database for users, billing info, and other metadata that's outside the scope of the domain data model.
I've been advised that this approach is only a good idea if I can avoid performing I/O to both the RDBMS and NoSQL data stores on the same request as much as possible.
My questions:
Is this good advice? (I'm assuming so, but the rest of these questions are useless if the first premise is wrong.)
I'd like to cache at least the logged on user as much as possible. Is it possible to use Django sessions to do this in a way that is secure, reliably correct, and fault-tolerant? Ideally, I would like to have the session API be a safe, drop-in replacement for retrieving the current user with as little interaction with the users table as possible. What legwork will I need to do to hook everything up?
If this ends up being too much of a hassle, how easy is it to store user information in the NoSQL store (that is, eliminate the RDBMS completely) without using django-nonrel? Can custom authentication/authorization backends do this?
I'm pondering using the same approach for my application and I think it is generally safe but requires special care to tackle cache consistency issues.
The way Django normally operates is that when request is received, a query is run against a Session table to find a session associated with a cookie from the request. Then, when you access request.user, a query is run against a User table to find a user for a given session (if any, because Django supports anonymous sessions). So, by default, Django needs two queries to associate each request with a user, which is expensive.
A nice thing about Django session is that it can be used as a key, value store without extending any model class (unlike for example User class that is hard to extend with additional fields). So you can for example put request.session['email'] = user.email to store additional data in the session. This is safe, in a sense, that what you read from request.session dictionary is for sure what you have put there, client has no way to change these values. So you can indeed use this technique to avoid query to the User table.
To avoid query to the Session table, you need to enable session caching (or store session data in the client cookie with django.contrib.sessions.backends.signed_cookies, which is safe, because such cookies are cryptographically protected against modification by a client).
With caching enabled, you need 0 queries to associate a request with user data. But the problem is cache consistency. If you use local in memory cache with write through option (django.core.cache.backends.locmem.LocMemCache with django.contrib.sessions.backends.cached_db) the session data will be written to a DB on each modification, but it won't be read from a DB if it is present in the cache. This introduces a problem if you have multiple Django processes. If one process modifies a session (for example changes session['email']), other process can still use an old, cached value.
You can solve it by using shared cache (Memcached backend), which guarantees that changes done by one process are visible to all other processes. In this way, you are replacing a query to a Session table with a request to a Memcached backend, which should be much faster.
Storing session data in a client cookie can also solve cache consistency issues. If you modify an email field in the cookie, all future requests send by the client should have a new email. Although client can deliberately send an old cookie, which still carries old values. Whether this is a problem is application depended.

Django: Securing / encrypting stored files

In a Django project, I want to keep user uploaded files secure on the server. Should this be done at the OS level (we are using ubuntu) or at the application level?
Encrypting at the application level will be easier to maintain. But, aside from some drawbacks like possible negative effect on performance, I am not even sure if this will have any point. If a hacker compromises the server, he will also have access to the encryption keys and how it is encrypted / decrypted.
Any suggestions are greatly appreciated. Thanks.
How you protect your data depends on what kinds of attacks you want to protect against. Of course, you probably don't know how an attacker is most likely to compromise your system, unless there are certain threat models you're particularly trying to protect against, like say a rogue sysadmin.
The attacker might gain access to the OS that the web server is running on. In this case, filesystem level encryption probably does you no good. In fact file-system level encryption is probably only useful protection against somebody walking off with the physical server (which is a totally valid threat model). However, if the files are encrypted with keys stored in the database, then an attacker who has access to the webserver OS but not the database is thwarted.
In contrast, an attacker might gain access to the database but not the OS, through a hole in your application. I would expect this to be less likely since modern operating systems present huge and well-studied attack surfaces.
To protect your user's data against an attacker with full access to your servers is very difficult. You need to encrypt the data with a key that your servers don't have. This could be something like a password or a key stored in a user cookie. The problem with all these schemes is that users can't be trusted to hold on to critical data like this -- they always want a way to reset their password if they forget. In most cases, it's not realistic to protect data against an attacker with full access to your OS and your database.
So I'd choose what you're trying to protect against. Personally, I'd expect an OS penetration to be most likely, and thus encrypt the files with keys that are stashed in a part of the database that is extra protected somehow. The challenge here is that the OS has to store database login credentials (in settings.py) in order for the web app to function. So try to keep those files as restricted as possible within the OS i.e. chmod 600 on a user account that does as little else as possible.
You're right that if the key used to encrypt the files is stored on the server you don't get a whole lot of added security by encrypting the files.
However, if you use a key provided by the user, then you do get some security. For example, if you store the encryption key in a cookie, then it will only be available for the duration of each request. I don't believe this will create any new security issues (if an attacker can steal the cookie, they can also steal the user's session), and it will make it much harder for an attacker to access files belonging to users who aren't currently online.
If you're really paranoid, you could do what 1Password does, and send encrypted data back to the browser, which can decrypt it with JavaScript encryption routines…

comparison of ways to maintain state

There are various ways to maintain user state using in web development.
These are the ones that I can think of right now:
Query String
Cookies
Form Methods (Get and Post)
Viewstate (ASP.NET only I guess)
Session (InProc Web server)
Session (Dedicated web server)
Session (Database)
Local Persistence (Google Gears) (thanks Steve Moyer)
etc.
I know that each method has its own advantages and disadvantages like cookies not being secure and QueryString having a length limit and being plain ugly to look at! ;)
But, when designing a web application I am always confused as to what methods to use for what application or what methods to avoid.
What I would like to know is what method(s) do you generally use and would recommend or more interestingly which of these methods would you like to avoid in certain scenarios and why?
While this is a very complicated question to answer, I have a few quick-bite things I think about when considering implementing state.
Query string state is only useful for the most basic tasks -- e.g., maintaining the position of a user within a wizard, perhaps, or providing a path to redirect the user to after they complete a given task (e.g., logging in). Otherwise, query string state is horribly insecure, difficult to implement, and in order to do it justice, it needs to be tied to some server-side state machine by containing a key to tie the client to the server's maintained state for that client.
Cookie state is more or less the same -- it's just fancier than query string state. But it's still totally maintained on the client side unless the data in the cookie is a key to tie the client to some server-side state machine.
Form method state is again similar -- it's useful for hiding fields that tie a given form to some bit of data on the back end (e.g., "this user is editing record #512, so the form will contain a hidden input with the value 512"). It's not useful for much else, and again, is just another implementation of the same idea behind query string and cookie state.
Session state (any of the ways you describe) are all great, since they're infinitely extensible and can handle anything your chosen programming language can handle. The first caveat is that there needs to be a key in the client's hand to tie that client to its state being stored on the server; this is where most web frameworks provide either a cookie-based or query string-based key back to the client. (Almost every modern one uses cookies, but falls back on query strings if cookies aren't enabled.) The second caveat is that you need to put some though into how you're storing your state... will you put it in a database? Does your web framework handle it entirely for you? Again, most modern web frameworks take the work out of this, and for me to go about implementing my own state machine, I need a very good reason... otherwise, I'm likely to create security holes and functionality breakage that's been hashed out over time in any of the mature frameworks.
So I guess I can't really imagine not wanting to use session-based state for anything but the most trivial reason.
Security is also an issue; values in the query string or form fields can be trivially changed by the user. User authentication should be saved either in an encrypted or tamper-evident cookie or in the server-side session. Keeping track of values passed in a form as a user completes a process, like a site sign-up, well, that can probably be kept in hidden form fields.
The nice (and sometimes dangerous) thing, though, about the query string is that the state can be picked up by anyone who clicks on a link. As mentioned above, this is dangerous if it gives the user some authorization they shouldn't have. It's nice, though, for showing your friends something you found on the site.
With the increasing use of Web 2.0, I think there are two important methods missing from your list:
8 AJAX applications - since the page doesn't reload and there is no page to page navigation, state isn't an issue (but persisting user data must use the asynchronous XML calls).
9 Local persistence - Browser-based applications can persist their user data and state to the local hard drive using libraries such as Google Gears.
As for which one is best, I think they all have their place, but the Query String method is problematic for search engines.
Personally, since almost all of my web development is in PHP, I use PHP's session handlers.
Sessions are the most flexible, in my experience: they're normally faster than db accesses, and the cookies they generate die when the browser closes (by default).
Avoid InProc if you plan to host your website on a cheap-n-cheerful host like webhost4life. I've learnt the hard way that because their systems are over subscribed, they recycle the applications very frequently which causes your session to get lost. Very annoying.
Their suggestion is to use StateServer which is fine except you have to serialise/deserialise the session eash post back. I love objects and my web app is full of them. I'm concerned about performance when switching to StateServer. I need to refactor to only put the stuff I really need in the session.
Wish I'd know that before I started...
Cheers, Rob.
Be careful what state you store client side (query strings, form fields, cookies). Anything security-related should not be stored client-side, except maybe a session identifier if it is reasonably obscured and hard to guess. There are too many websites that have settings like "authenticated=true" and store those in a cookie or query string or hidden form field. It is trivial for a user to bypass something like that. Remember that ANY input coming from a client could have been tampered with and should not be trusted.
Signed Cookies linked to some sort of database store when you need to grab data. There's no reason to be storing data on the client side if you have a connected back-end; you're just looking for trouble if this is a public facing website.
It's not some much a question of what to use & what to avoid, but when to use which. Each has a particular circumstances when it is the best, and a different circumstance when it's the worst.
The deciding factor is generally lifetime of the data. Session state lives longer than form fields, and so on.