Spam proof hit counter in Django

Spam proof hit counter in Django - django

I already looked at the most popular Django hit counter solutions and none of them seem to solve the issue of spamming the refresh button.
Do I really have to log the IP of every visitor to keep them from artificially boosting page view counts by spamming the refresh button (or writing a quick and dirty script to do it for them)?
More information
So right now you can inflate your view count with the following few lines of Python code. Which is so little that you don't really need to write a script, you could just type it into an interactive session:
from urllib import urlopen
num_of_times_to_hit_page = 100
url_of_the_page = "http://example.com"
for x in range(num_of_times_to_hit_page):
urlopen(url_of_the_page)
Solution I'll probably use
To me, it's a pretty rough situation when you need to do a bunch of writes to the database on EVERY page view, but I guess it can't be helped. I'm going to implement IP logging due to several users artificially inflating their view count. It's not that they're bad people or even bad users.
See the answer about solving the problem with caching... I'm going to pursue that route first. Will update with results.
For what it's worth, it seems Stack Overflow is using cookies (I can't increment my own view count, but it increased when I visited the site in another browser.)
I think that the benefit is just too much, and this sort of 'cheating' is just too easy right now.
Thanks for the help everyone!

Logging an IP is probably the safest. It's not perfect, but it's better than cookies and less annoying to users than requiring a signup. That said, I'd recommend not bothering with saving these in a DB. Instead, use Django's low-level caching framework. The key would be the ip and the value a simple boolean. Even a file-based cache should be pretty fast, though go with memchached as the cache backend if you really expect heavy traffic.
Something like this should work:
ip = request.META['REMOTE_ADDR']
has_voted = cache.get(ip)
if not has_voted:
cache.set(ip, True)
#code to save vote goes here

There is no foolproof way of preventing someone from artificially inflating a count. Rather, there's the extent to which you're willing to spend time making it more difficult for them to do so:
Not at all (they click refresh button)
Set a cookie, check cookie to see if they were already there (they clear cookies)
Log IP addresses (the fake a different IP every time)
Require signin with an email they respond from (they sign up for multiple email accounts)
So, in the end, you just need to pick the level of effort you want to go to in order to prevent that users from abusing the system.

You could send them a cookie when they access it and then check for that cookie. It can still be gamed, but it's a bit harder.

Related

Links within email that can edit data in database

I apologize if this is trivial but I thought this would be easy to find but I think the problem is that I'm not sure what I'm looking for.
Basically, I want to send out an email to a customer who had their pickup missed by the pickup truck to help reschedule a new date or cancel the pickup all together depending on which link in the email they clicked.
Using pk's and ID's seemed like a security flaw to link into any view as the URL could be easily altered. What protocols/libraries would I need to use to accomplish such a task? Do I just assign a UUID to each customer in my database and go off that?

Having a UUID to represent the user would be fine, but keep in mind it's just a speed bump. E-mails aren't safe and can be read by a 3rd party. Even with UUID's someone can impersonate another. It sounds like it's a rather low risk issue, though. What's the worst case here? Do you have ways to mitigate it through customer support?
If you wanted to make things more secure, you'd just have a link that would require authentication to make changes to their order. It's a balance between friction with your users and security.
It seems you have the right ideas already. I'd suggest not user pk's just because they are often incremental and it's easy to just iterate through all your customers. UUIDs just increases the number space significantly that it should deter people from doing it assuming there isn't anything to be gained.

Shopping cart for anonymous users and it's protection

If my shopping cart is stored in a DB for both anonymous and registred users, what is the best way to protect it from an attack? I found a lot of discussions about where to store shopping carts but nothing about this matter.
What if some bot just doesn't store cookies and sends requests to the shopping cart over and over again. Without cookies it's different anonymous user every time thus database will grow.
Should I check the ip address and redirect to a captcha? But real users may have same ip addresses, so the algorithm should be more complicated to not disturb them.
Any ideas or links?

Captcha's are a pretty popular way to go. I'm guessing most people (like me) rather dislike them and generally can't read them, but they're generally probably pretty easy to implement and more efficient than most alternatives.
The less effort approach to checking IPs is having all anonymous users require a Captcha.
I would also suggest having a time-out (based on activity) of no more than a few hours on carts of anonymous users (after which you can probably delete it).
You'd also want an upper limit on the number of items, and, if this number is high, also possibly prevent (though Captcha?) users from adding too many items to their cart in inhumanly short succession.

Tracing requests of users by logging their actions to DB in django

I want to trace user's actions in my web site by logging their requests to database as plain text in Django.
I consider to write a custom decorator and place it to every view that I want to trace.
However, I have some troubles in my design.
First of all, is such logging mecahinsm reasonable or because of my log table will be enlarging rapidly it causes some preformance problems ?
Secondly, how should be my log table's design ?
I want to keep keywords if the user call search view or keep the item's id if the user call details of item view.
Besides, IP addresses of user's should be kept but how can I seperate users if they connect via single IP address as in many companies.
I am glad to explain in detail if you think my question is unclear.
Thanks

I wouldn't do that. If this is a production service then you've got a proper web server running in front of it, right? Apache, or nginx or something. That can do logging, and can do it well, and can write to a form that won't bloat your database, and there's a wealth of analytical tools for log analysis.
You are going to have to duplicate a lot of that functionality in your decorator, such as when you want to switch it on or off, or change the log level. The only thing you'll get by doing it all in django is the possibility of ultra-fine control, such as only logging views of blog posts with id numbers greater than X or something. But generally you'd not want that level of detail, and you'd log everything and do any stripping at the analysis phase. You've not given any reason currently why you need to do it from Django.
If you really want it in a RDBMS, reading an apache log file into Postgres or MySQL or one of those expensive ones is fairly trivial.

One thing you should keep in mind is that SQL databases don't offer you a very good writing performance (in comparison with reading), so if you are experiencing heavy loads you should probably look for a better in-memory solution (eg. some key-value-store like redis).
But keep in mind, that, especially if you would use a non-sql solution you should be aware what you want to do with the collected data (just display something like a 'log' or do some more in-deep searching/querying on the data).
If you want to identify different users from the same IP address you should probably look for a cookie-based solution (if you are using django's session framework the session's are per default identified through a cookie - so you could just simply use sessions). Another solution could be doing the logging 'asynchronously' via javascript after the page has loaded in the browser (which could give you more possibilities in identifying the user and avoid additional load when generating the page).

In a website with no users, are cookies the only way to prevent people from repeating actions?

I'm creating a website and I don't want to have user/membership.
However, people using the site can do things like vote and flag questionable content. Is the only way to identify and prevent users from doing this repeatedly by using cookies?
I realize a user can just clear their cookies but I can't think of another way.
Suggestions for this scenario?

Well you could map a cookie + ip-adress in a datarecord in your database. To identify the user. So if the ip exists in the database, you simply just add the cookie, but check the cookie first to avoid unessesary database calls.
This is not optimal though, since schools etc might have the same ips on a lot of computers.
You can always just adapt openid!

Marko & Visage are correct,
Just to add though, you might want to store each vote with the timestamp,IP, etc... so at least if someone does try to "game" your site, you'd be able to rollback sets of votes made from the same location or within a very short amount of time (i.e. from a bot)

+1 To all that others have already said. Here's another middle-way idea:
Use cookies as primary means of limiting voting. If a cookie is not found, check the IP address. Allow no more than, say, 1 vote per 5 minutes from the same IP.

Cookies are not enough, as you said it could be cleared/expired.
IP address tracking is also not an option because of DHCP and firewalls.
The only thing that is ~100% sure is users, but then again, one person can register multiple accounts.
I'll go with cookies, as the simplest ant least obtrusive way. If someone really wants to play the system, he will find a way whatever you try to prevent it.

Even with membership a user can register multiple times and vote.
Cookies are one way to handle this but people who know that they can delete cookie can vote again.
You can catch the IP of the voter and restrict based on that. But many people will have same IP.
Sadly there is no other way.

Yes, you are right.
HTTP is stateless, so there is no way of determining if the origin of a request you receive now is the same or different to the origin of a request you received, say, 5 minutes ago.
Cookies are the only way around this. Even server side sessions rely on cookies to maintain session identity across requests (ignoring the security nightmare of passing the sesison ID in the URL, which anyone with malicious intent can sidestep trivially).

There will always be people gaming the system if it suits them. Moreover, if you make it such that you don't need cookies at all you'd be open to very simple attacks.
I think you'll want to consider ways to increase the economic cost of users operating under a cloud of suspicion.
For example, if a user with the same cookie tries to re-submit the vote, that can obviously be stopped easily.
If a user with a different cookie but from the same IP does the same thing, it could be coming from a proxy/firewall so you may want to be cautious and force them to do something extra, like a simple CAPTCHA. Once they've done this, if they behave properly nothing new is required as long as their new cookie stays with them.
This implies that people without cookies can still participate, but they have to re-enter the letter sequence or whatever each time. A hassle, but they're likely used to sites not working without cookies. They'd be able to make an exception if required.
You really won't be able to deal with users sitting over a pool of IPs (real or otherwise) and exploiting new and dynamic attack vectors on your site. In that case, their economic investment will be more than yours and, frankly, you'll lose. At that point, you're just competing to maintain the rules of your system. That's when you should explore requiring signup/email/mobile/SMS confirmation to up the ante.

You can add GET variables and URL parts to serve as cookies - some sites do that to allow logins and/or tracking when cookies are disabled. Generate the part using source IP and user agent string, for example.
site.com/vote?cookie=123456
site.com/vote/cookie123456

comparison of ways to maintain state

There are various ways to maintain user state using in web development.
These are the ones that I can think of right now:
Query String
Cookies
Form Methods (Get and Post)
Viewstate (ASP.NET only I guess)
Session (InProc Web server)
Session (Dedicated web server)
Session (Database)
Local Persistence (Google Gears) (thanks Steve Moyer)
etc.
I know that each method has its own advantages and disadvantages like cookies not being secure and QueryString having a length limit and being plain ugly to look at! ;)
But, when designing a web application I am always confused as to what methods to use for what application or what methods to avoid.
What I would like to know is what method(s) do you generally use and would recommend or more interestingly which of these methods would you like to avoid in certain scenarios and why?

While this is a very complicated question to answer, I have a few quick-bite things I think about when considering implementing state.
Query string state is only useful for the most basic tasks -- e.g., maintaining the position of a user within a wizard, perhaps, or providing a path to redirect the user to after they complete a given task (e.g., logging in). Otherwise, query string state is horribly insecure, difficult to implement, and in order to do it justice, it needs to be tied to some server-side state machine by containing a key to tie the client to the server's maintained state for that client.
Cookie state is more or less the same -- it's just fancier than query string state. But it's still totally maintained on the client side unless the data in the cookie is a key to tie the client to some server-side state machine.
Form method state is again similar -- it's useful for hiding fields that tie a given form to some bit of data on the back end (e.g., "this user is editing record #512, so the form will contain a hidden input with the value 512"). It's not useful for much else, and again, is just another implementation of the same idea behind query string and cookie state.
Session state (any of the ways you describe) are all great, since they're infinitely extensible and can handle anything your chosen programming language can handle. The first caveat is that there needs to be a key in the client's hand to tie that client to its state being stored on the server; this is where most web frameworks provide either a cookie-based or query string-based key back to the client. (Almost every modern one uses cookies, but falls back on query strings if cookies aren't enabled.) The second caveat is that you need to put some though into how you're storing your state... will you put it in a database? Does your web framework handle it entirely for you? Again, most modern web frameworks take the work out of this, and for me to go about implementing my own state machine, I need a very good reason... otherwise, I'm likely to create security holes and functionality breakage that's been hashed out over time in any of the mature frameworks.
So I guess I can't really imagine not wanting to use session-based state for anything but the most trivial reason.

Security is also an issue; values in the query string or form fields can be trivially changed by the user. User authentication should be saved either in an encrypted or tamper-evident cookie or in the server-side session. Keeping track of values passed in a form as a user completes a process, like a site sign-up, well, that can probably be kept in hidden form fields.
The nice (and sometimes dangerous) thing, though, about the query string is that the state can be picked up by anyone who clicks on a link. As mentioned above, this is dangerous if it gives the user some authorization they shouldn't have. It's nice, though, for showing your friends something you found on the site.

With the increasing use of Web 2.0, I think there are two important methods missing from your list:
8 AJAX applications - since the page doesn't reload and there is no page to page navigation, state isn't an issue (but persisting user data must use the asynchronous XML calls).
9 Local persistence - Browser-based applications can persist their user data and state to the local hard drive using libraries such as Google Gears.
As for which one is best, I think they all have their place, but the Query String method is problematic for search engines.

Personally, since almost all of my web development is in PHP, I use PHP's session handlers.
Sessions are the most flexible, in my experience: they're normally faster than db accesses, and the cookies they generate die when the browser closes (by default).

Avoid InProc if you plan to host your website on a cheap-n-cheerful host like webhost4life. I've learnt the hard way that because their systems are over subscribed, they recycle the applications very frequently which causes your session to get lost. Very annoying.
Their suggestion is to use StateServer which is fine except you have to serialise/deserialise the session eash post back. I love objects and my web app is full of them. I'm concerned about performance when switching to StateServer. I need to refactor to only put the stuff I really need in the session.
Wish I'd know that before I started...
Cheers, Rob.

Be careful what state you store client side (query strings, form fields, cookies). Anything security-related should not be stored client-side, except maybe a session identifier if it is reasonably obscured and hard to guess. There are too many websites that have settings like "authenticated=true" and store those in a cookie or query string or hidden form field. It is trivial for a user to bypass something like that. Remember that ANY input coming from a client could have been tampered with and should not be trusted.

Signed Cookies linked to some sort of database store when you need to grab data. There's no reason to be storing data on the client side if you have a connected back-end; you're just looking for trouble if this is a public facing website.

It's not some much a question of what to use & what to avoid, but when to use which. Each has a particular circumstances when it is the best, and a different circumstance when it's the worst.
The deciding factor is generally lifetime of the data. Session state lives longer than form fields, and so on.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js