Why storing result of getSignedUrl in database is bad idea?

Why storing result of getSignedUrl in database is bad idea? - amazon-web-services

I have started to work on existing project where result of getSignedUrl plus expiration time are being persisted in database. My intuition tells me that this is bad and wrong but I can't provide clear explanation why and what is better alternative? One of the factors why I think this approach is bad is that it requires data modification on read only data queries(ie if url has expired it has to be updated in database). Another reason - this feels for me like storage of computed value which is not even that expensive to compute and in case this have to be optimized I guess there should be additional cache mechanism which handles this instead of database. Is my reasoning correct? How can I provide better explanation to my colleagues?

This will really depend on the usecase as to why its being stored in the database, however as you point out it is adding extra latency for the user as presumably you would be querying the database or retrieving from the cache static images.
Whilst signed URLs can be re-used and there is nothing wrong with this, if every asset is using its own signed URL this will add to the maintenance of the application. There are two approaches that could do this:
A scripted action constantly recycling through assets to repopulate the data store of signed URLs.
Writing to the database to retrieve in the user workflow if the link has expired.
Needless to say none of these are ideal, if this is primarily frontend assets I would recommend looking at whether you can add CloudFront in front of your origin and make use of signed cookies instead.
This provides similar functionality to a signed URL, but instead have the application generate a cookie for the user that will then grant the user access to the assets without needing to either generate a new signed URL every time or look up in a database. Both of those options add to the user performance and experience.
If the signed URLs are for reports or generated content, I believe these should be generated when a user requests it (although this is my opinion). This will make it easier to look back through any auditing to determine when the action was required and will need a user to authenticate again once the URL has expired.
Also consider if the signed URL resolves to sensitive content and it is stored plainly in a database that your developers have access to, then they would be able to retrieve this information.

Related

Drupal 8 - What is PrivateTempStoreFactory and it's purpose?

Is this the same as php $_sessions?
Does it use php $_sessions? (edited)
When should I use it?
What are some down sides of using it?
Can or should I use it to store user input and results of forms for later operations?
And is it secure? (edited)

The private temporary storage differs from session storage in very significant ways and it is not intended as a replacement of it.
For logged in users, sessions are completely irrelevant, the data stored in the temporary storage is shared among all sessions of a given user, current and future.
Only for an anonymous user are sessions relevant: should their sessions expire, the contents is not retrievable any more because it is tied to the session ID. But the data is not stored in the session storage for anonymous users either, only the session ID is relevant.
The data expires after a time set on the container parameter called tempstore.expire which has nothing to do with the session cookie lifetime (nor is the latter relevant for logged in users).
There is metadata associated with each piece of data: the owner (either the logged in user id or a session id) and the updated time.
The durability expectations completely differ. Sessions are fundamentally ephemeral. Many places will tie sessions to IP addresses. It is certainly tied to a browser and as such, to a device. As a corollary, if clients can't expect sesssions to last, there's no reason for the server to cling to them heavily: putting the session storage on fast but less durable storage (say, memcached etc) is a completely valid speedup strategy. However, the private temporary storage is durable -- within expire, of course. A typical thing to store in a session is a "flash" message -- the one you set with drupal_set_message. If you set one such and then the session gets lost, oh well. Yeah, informing the user would've been nice but oh well. I certainly wouldn't expect to see a flash message follow me across browsers and devices.
In theory, a typical thing to store in the private temp storage would be a shopping cart. In practice, this is not done because a) carts, if not for the end user but for the back office are valuable, not temporary data b) when a user logs in, their session data is migrated but their private temp storage is not. WHether this is a bug is debateable, at the time of this writeup I can't find a core issue about this. This is a possible downside. So a Views UI like complex edit is one possible use case but note the Views UI itself uses the shared temporary storage facility, not the private one. In fact, the only usage I can find are node previews.

Here a very good articles about Storing Session Data with Drupal 8.
It cover exactly all your questions & more !
Take a look at it, the author give you also a lot of other links to help you.
Here a short summary:
1. Is this the same as php $_sessions?
Roughly equivalent. But (and it's an important but) using Drupal 8 services provides needed abstraction and structure for interacting with a global construct. It's part of an overall architecture that allows developers to build and extend complex applications sustainably.
2. When should I use it?
In past versions of Drupal, I might have just thrown the data in $_SESSION. In Drupal 8 there's a service for that; actually, two services: use user.private_tempstore and user.shared_tempstore for temporarily storing user-specific and non-user-specific data, respectively.
3. What are some down sides of using it?
Knowing POO.
4. Can or should I use it to store user input in forms for later operations?
Should.

Using Django sessions to store logged in user

I'm creating a REST-centric application that will use a NoSQL data store of some kind for most of the domain-specific models. For the primary site that I intend to build around the REST data framework, I still want to use a traditional relational database for users, billing info, and other metadata that's outside the scope of the domain data model.
I've been advised that this approach is only a good idea if I can avoid performing I/O to both the RDBMS and NoSQL data stores on the same request as much as possible.
My questions:
Is this good advice? (I'm assuming so, but the rest of these questions are useless if the first premise is wrong.)
I'd like to cache at least the logged on user as much as possible. Is it possible to use Django sessions to do this in a way that is secure, reliably correct, and fault-tolerant? Ideally, I would like to have the session API be a safe, drop-in replacement for retrieving the current user with as little interaction with the users table as possible. What legwork will I need to do to hook everything up?
If this ends up being too much of a hassle, how easy is it to store user information in the NoSQL store (that is, eliminate the RDBMS completely) without using django-nonrel? Can custom authentication/authorization backends do this?

I'm pondering using the same approach for my application and I think it is generally safe but requires special care to tackle cache consistency issues.
The way Django normally operates is that when request is received, a query is run against a Session table to find a session associated with a cookie from the request. Then, when you access request.user, a query is run against a User table to find a user for a given session (if any, because Django supports anonymous sessions). So, by default, Django needs two queries to associate each request with a user, which is expensive.
A nice thing about Django session is that it can be used as a key, value store without extending any model class (unlike for example User class that is hard to extend with additional fields). So you can for example put request.session['email'] = user.email to store additional data in the session. This is safe, in a sense, that what you read from request.session dictionary is for sure what you have put there, client has no way to change these values. So you can indeed use this technique to avoid query to the User table.
To avoid query to the Session table, you need to enable session caching (or store session data in the client cookie with django.contrib.sessions.backends.signed_cookies, which is safe, because such cookies are cryptographically protected against modification by a client).
With caching enabled, you need 0 queries to associate a request with user data. But the problem is cache consistency. If you use local in memory cache with write through option (django.core.cache.backends.locmem.LocMemCache with django.contrib.sessions.backends.cached_db) the session data will be written to a DB on each modification, but it won't be read from a DB if it is present in the cache. This introduces a problem if you have multiple Django processes. If one process modifies a session (for example changes session['email']), other process can still use an old, cached value.
You can solve it by using shared cache (Memcached backend), which guarantees that changes done by one process are visible to all other processes. In this way, you are replacing a query to a Session table with a request to a Memcached backend, which should be much faster.
Storing session data in a client cookie can also solve cache consistency issues. If you modify an email field in the cookie, all future requests send by the client should have a new email. Although client can deliberately send an old cookie, which still carries old values. Whether this is a problem is application depended.

Passing a serialized object through a URL

I am serializing/pickling an object, encoding it as a compressed string, and passing it as a parameter in the URL for the next page to deserialize. My web app does not have a database; I am doing this because the app gets data from external web services, which are slow.
Is this acceptable practice? Is this a security risk? Is there a way to make this secure?

If you need to share data between views, do it with the session. That's what sessions are made for. Session info is stored in the database by default, but it doesn't have to be, you can also use the filesystem, some caching system (memcache, Redis, etc), or signed-cookies (Django 1.4+ only).
See:
Configuring the Session Engine
How to Use Sessions

Is this a security risk?
If the serialisation you are using is pickle then yes that is definitely a problem, as alluded to on the doc:
Never unpickle data received from an untrusted or unauthenticated source
Use a form of serialisation designed only to hold safe static values (eg JSON).
You can protect a value that you send to the client side from tampering by signing it with a MAC, eg using hmac. You may need to consider adding other properties to the MAC-signed data such as username or timestamp, to prevent signed data blocks being freely interchangeable, if that's a threat to whatever integrity you are trying to achieve.
If you also need to protect the value from being viewed and interpreted by the client side user you would need to use an encryption algorithm (eg AES - not part of stdlib) in addition to the signing.
(I still wouldn't personally trust a MAC-signed and encrypted pickle. Even though it would need the server-side secret to be leaked to make it exploitable, you don't really want an information-leakage vulnerability to escalate to an arbitrary-code-execution vulnerability, which is what pickle represents.)

It is not the best option, since URL parameter fields will show in server logs. You're
probably better of sending data with POST method or better yet, creating a rudimentary database (if you don't have access to anything else, use Sqlite) and just pass the ID to the next screen.

Tracing requests of users by logging their actions to DB in django

I want to trace user's actions in my web site by logging their requests to database as plain text in Django.
I consider to write a custom decorator and place it to every view that I want to trace.
However, I have some troubles in my design.
First of all, is such logging mecahinsm reasonable or because of my log table will be enlarging rapidly it causes some preformance problems ?
Secondly, how should be my log table's design ?
I want to keep keywords if the user call search view or keep the item's id if the user call details of item view.
Besides, IP addresses of user's should be kept but how can I seperate users if they connect via single IP address as in many companies.
I am glad to explain in detail if you think my question is unclear.
Thanks

I wouldn't do that. If this is a production service then you've got a proper web server running in front of it, right? Apache, or nginx or something. That can do logging, and can do it well, and can write to a form that won't bloat your database, and there's a wealth of analytical tools for log analysis.
You are going to have to duplicate a lot of that functionality in your decorator, such as when you want to switch it on or off, or change the log level. The only thing you'll get by doing it all in django is the possibility of ultra-fine control, such as only logging views of blog posts with id numbers greater than X or something. But generally you'd not want that level of detail, and you'd log everything and do any stripping at the analysis phase. You've not given any reason currently why you need to do it from Django.
If you really want it in a RDBMS, reading an apache log file into Postgres or MySQL or one of those expensive ones is fairly trivial.

One thing you should keep in mind is that SQL databases don't offer you a very good writing performance (in comparison with reading), so if you are experiencing heavy loads you should probably look for a better in-memory solution (eg. some key-value-store like redis).
But keep in mind, that, especially if you would use a non-sql solution you should be aware what you want to do with the collected data (just display something like a 'log' or do some more in-deep searching/querying on the data).
If you want to identify different users from the same IP address you should probably look for a cookie-based solution (if you are using django's session framework the session's are per default identified through a cookie - so you could just simply use sessions). Another solution could be doing the logging 'asynchronously' via javascript after the page has loaded in the browser (which could give you more possibilities in identifying the user and avoid additional load when generating the page).

comparison of ways to maintain state

There are various ways to maintain user state using in web development.
These are the ones that I can think of right now:
Query String
Cookies
Form Methods (Get and Post)
Viewstate (ASP.NET only I guess)
Session (InProc Web server)
Session (Dedicated web server)
Session (Database)
Local Persistence (Google Gears) (thanks Steve Moyer)
etc.
I know that each method has its own advantages and disadvantages like cookies not being secure and QueryString having a length limit and being plain ugly to look at! ;)
But, when designing a web application I am always confused as to what methods to use for what application or what methods to avoid.
What I would like to know is what method(s) do you generally use and would recommend or more interestingly which of these methods would you like to avoid in certain scenarios and why?

While this is a very complicated question to answer, I have a few quick-bite things I think about when considering implementing state.
Query string state is only useful for the most basic tasks -- e.g., maintaining the position of a user within a wizard, perhaps, or providing a path to redirect the user to after they complete a given task (e.g., logging in). Otherwise, query string state is horribly insecure, difficult to implement, and in order to do it justice, it needs to be tied to some server-side state machine by containing a key to tie the client to the server's maintained state for that client.
Cookie state is more or less the same -- it's just fancier than query string state. But it's still totally maintained on the client side unless the data in the cookie is a key to tie the client to some server-side state machine.
Form method state is again similar -- it's useful for hiding fields that tie a given form to some bit of data on the back end (e.g., "this user is editing record #512, so the form will contain a hidden input with the value 512"). It's not useful for much else, and again, is just another implementation of the same idea behind query string and cookie state.
Session state (any of the ways you describe) are all great, since they're infinitely extensible and can handle anything your chosen programming language can handle. The first caveat is that there needs to be a key in the client's hand to tie that client to its state being stored on the server; this is where most web frameworks provide either a cookie-based or query string-based key back to the client. (Almost every modern one uses cookies, but falls back on query strings if cookies aren't enabled.) The second caveat is that you need to put some though into how you're storing your state... will you put it in a database? Does your web framework handle it entirely for you? Again, most modern web frameworks take the work out of this, and for me to go about implementing my own state machine, I need a very good reason... otherwise, I'm likely to create security holes and functionality breakage that's been hashed out over time in any of the mature frameworks.
So I guess I can't really imagine not wanting to use session-based state for anything but the most trivial reason.

Security is also an issue; values in the query string or form fields can be trivially changed by the user. User authentication should be saved either in an encrypted or tamper-evident cookie or in the server-side session. Keeping track of values passed in a form as a user completes a process, like a site sign-up, well, that can probably be kept in hidden form fields.
The nice (and sometimes dangerous) thing, though, about the query string is that the state can be picked up by anyone who clicks on a link. As mentioned above, this is dangerous if it gives the user some authorization they shouldn't have. It's nice, though, for showing your friends something you found on the site.

With the increasing use of Web 2.0, I think there are two important methods missing from your list:
8 AJAX applications - since the page doesn't reload and there is no page to page navigation, state isn't an issue (but persisting user data must use the asynchronous XML calls).
9 Local persistence - Browser-based applications can persist their user data and state to the local hard drive using libraries such as Google Gears.
As for which one is best, I think they all have their place, but the Query String method is problematic for search engines.

Personally, since almost all of my web development is in PHP, I use PHP's session handlers.
Sessions are the most flexible, in my experience: they're normally faster than db accesses, and the cookies they generate die when the browser closes (by default).

Avoid InProc if you plan to host your website on a cheap-n-cheerful host like webhost4life. I've learnt the hard way that because their systems are over subscribed, they recycle the applications very frequently which causes your session to get lost. Very annoying.
Their suggestion is to use StateServer which is fine except you have to serialise/deserialise the session eash post back. I love objects and my web app is full of them. I'm concerned about performance when switching to StateServer. I need to refactor to only put the stuff I really need in the session.
Wish I'd know that before I started...
Cheers, Rob.

Be careful what state you store client side (query strings, form fields, cookies). Anything security-related should not be stored client-side, except maybe a session identifier if it is reasonably obscured and hard to guess. There are too many websites that have settings like "authenticated=true" and store those in a cookie or query string or hidden form field. It is trivial for a user to bypass something like that. Remember that ANY input coming from a client could have been tampered with and should not be trusted.

Signed Cookies linked to some sort of database store when you need to grab data. There's no reason to be storing data on the client side if you have a connected back-end; you're just looking for trouble if this is a public facing website.

It's not some much a question of what to use & what to avoid, but when to use which. Each has a particular circumstances when it is the best, and a different circumstance when it's the worst.
The deciding factor is generally lifetime of the data. Session state lives longer than form fields, and so on.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js