Decoupled CMS, Selective crawling of the db server - regex

We are running a decoupled CMS (using Wordpress as our db) and want to prevent search engines from crawling our posts from this server. We have post templates on that server so writers can preview their posts and Google has found them.
Am I able to detect a that a crawler is trying to access these pages in my .htaccess file and redirect to the www server? Is redirecting the wrong solution here? Can robots.txt block a generic pattern such as category/post-title?
Three things need to still happen:
We need to still be able to access db.site.com/wp-admin
Writers still need to preview their posts, which means they cannot be redirected.
db.site.com/wp-content/uploads needs to be accessible so Social sites can pull
images.
Here are how the posts are setup. Basically I want to block or redirect posts from db.site.com
db.site.com/category/post-title
www.site.com/category/post-title

Related

Redirecting old urls to new urls in Django

After publishing site in Django, I changed my slugs and now old pages show as errors in Google search. Is there a way to automatically redirect them without having to write an individual redirect for each page?
There are a few things you need to do to make sure that your website gets crawled properly.
In regards to the redirection, you can use django.http.HttpResponsePermanentRedirect to perform the redirection. Just keep the view, and when a user navigates to this view, redirect them to the proper URL.
You should also create a sitemap, which lists out all of the URLs for your website. You can then submit this sitemap to google using their webmaster tool if you have not already done so. This will inform their crawler of all the pages that they need to crawl without worrying on them missing some information

Django URL conf and Backbone.js Router

I have a backbone.js single-page app that is all set up with the router (well, actually a Backbone.Marionette app with a Backbone.Marionette AppRouter, but nevertheless). However, the backend is based in Django, where I do not have the URL conf directing to views for all URLs that are already in the backbone.js routes.
Based on the existing URLs in the Django URL conf, Backbone.js will serve the backbone routes regardless of what is listed in the Django conf - it seems something, anything just needs to be there.
Do I need to have proper Django views in order to offer a fallback for older browsers/SEO?
What are the best practices to coordinate the Django URL conf and the Backbone.js Router?
I've found a post that addresses this issue quite well:
http://duganchen.ca/single-page-web-app-architecture-done-right/
Briefly, my reasoning for including a fallback is for non-javascript browsers and SEO reasons. At the time of this post, non-javascript browsers account for ~1.4% (less than 2% from everything I've read) of users, making SEO The major consideration. Again, SEO may not be relevant for everyone reading this post, in which case, this can be skipped.
I found Thomas Davis' tutorial using phantom.js quite helpful. http://backbonetutorials.com/seo-for-single-page-apps/
However, another issue that I needed to account for was the history API, which has been neglected by all but the latest IE browsers. Given my client's users, about 15% of which are using IE <= 9, this was also a problem.
In the end, I also needed to use history.js. All in all, this was a lot of work to update an otherwise very simple website. However, I learned a lot from this ordeal.
In my opinion if your backbone app is truly a single page then you don't need any django views whatsoever. You can serve your index.html as a static file (in production, not even by django) and then let backbone's router take care of your url configuration, as you're doing already. You can use backbone's history and navigate to fake urls, add urls parameters etc, for resources in your app.

Migrating from wordpress to django without losing pagerank

I have got a blog in wordpress (www.ashwinm.com) which i am looking to migrate to django as i am very much impressed with it. Is there any way so that i can migrate to django without losing my current pagerank (which is 3)?
I don't mind losing all current contents of this blog as it is too old.
You could take a look at something like django-wordpress. That would allow you to keep your current content in the existing wordpress DB (read-only) and continue to develop other portions of the site with Django.
If you have a high pagerank, that is probably because you have content and the content is linked to by other people. You should try to keep that content in some form (it doesn't have to be exactly the same form), or at least ensure that every URL that is being linked to redirects to something useful. No one who follows a previously valid link to your website should get a 404.
Your content and your inbound links together are responsible for your pagerank, so if you let both die then you're back at square one, regardless of what web application framework you are using.
If I were doing it I would probably set up the new blog with Django and import the data manually. Or, if there is simply way too much data, I could move the Wordpress server to be served from a directory such as /archive and instruct my webserver to 301 redirect all of the old Wordpress blog entry URLs to the new directory. You would have to continue to maintain the Wordpress installation to some degree, but you would be 100% certain to keep all of your pagerank.

Django: control access to "static" files

Ok, I know that serving media files through Django is a not recommended. However, I'm in a situation where I'd like to serve "static" files using fine-grained access control through Django models.
Example: I want to serve my movie library to myself over the web. I'm often travelling and I'd like to be able to view any of my movies wherever I am, provided I have internet access. So I rip my DVDs, upload them to my server and build this simple Django application coupled with some embeddable video player.
To avoid any legal repercussions, I'd like to ensure that only logged-on users with the proper permissions (i.e. myself and people living in the same household, which can, like me, access the real DVDs at their convenience), but denies it to other users (i.e. people who posted comments on my blog) and returns an HTTP 404.
Now, serving these files directly using Apache and mod_wsgi is rather troublesome because when an HTTP request for the media files (i.e. http://video.mywebsite.com/my-favorite-movie/) comes in, I need to validate against my user database that the person at the other end has the proper permissions.
Question: can I achieve this effect without serving the media files directly through a Django view? What are my options?
One thing I did think of is to write a simple script that takes a session ID and a video's slug and returns some boolean indicating if the user may (or may not) access the video file. Then, somehow request mod_wsgi to execute this script before accessing the requested URL and return an HTTP 404 if the script failed. However, I don't have a clue if this is even possible.
Edit: Posting this question clarified some of my ideas for search and I've come across mod_python's file wrapper extension. Does anyone have enough experience with that to validate that it is a viable solution?
Yes, you can hook into Django's authentication from Apache. See this how-to:
Authenticating against Django’s user database from Apache

How to configure server for small hosting company for django-powered flash sites?

I'm looking at setting up a small company that hosts flash-based websites for artist portfolios. The customer control panel would be django-powered, and would provide the interface for uploading their images, managing galleries, selling prints, etc.
Seeing as the majority of traffic to the hosted sites would end up at their top level domain, this would result in only static media hits (the HTML page with the embedded flash movie), I could set up lighttpd or nginx to handle those requests, and pass the django stuff back to apache/mod_whatever.
Seems as if I could set this all up on one box, with the django sites framework keeping each site's admin separate.
I'm not much of a server admin. Are there any gotchas I'm not seeing?
Maybe. I don't think the built-in admin interface is really designed to corral admins into their own sites. The sites framework is more suited to publish the same content on multiple sites, not to constrain users to one site or another. You'd be better off writing your own admin interface that enforces those separations.
As far as serving content goes, it seems like you could serve up a common (static) Flash file that uses a dynamic XML file to fill in content. If you use Django to generate the XML, that would give you the dynamic content you need.
This django snippet might be what you need to keep them seperate:
http://www.djangosnippets.org/snippets/1054/
"A very simple multiple user blog model with an admin interface configured to only allow people to edit or delete entries that they have created themselves, unless they are a super user."
Depending on the amount of sites you're going to host it might be easier to write a single Django app once, with admin, and to create a separate Django project for each new site. This is simple, it works for sure AND as an added bonus you can add features to newer sites without running the risk of causing problems in older sites.
Then again, it might be handier to customize the admin such that you limit the amount of objects users can see to those on the given site itself. This is fairly easy to do, allthough you might want to use RequestSite instead of the usual Site from the sites framework as that requires separate settings for each site.
There exists this one method in the ModelAdmin which you can override to have manual control over the objects being edited.