Search engines and migrating a static site to a web app - django

We're replacing a static web-site with a Django app. All the uri's will change. The current web-site has a substantial presence on the search engine rankings and we don't want to mess that up too much. Is it simply a case of setting up 301 redirects to the new uri's, or is there something more subtle we need to do to ensure the search engines understand what's happened.

Normally when you change your site you will get a hit on your search result page rankings which will last for about 2-4 weeks.
Apex Internet has a good article on setting up the 301 redirects on both Apache, IIS, and other variants. Take a look here.
Steven Hargrove also has a good article on it here with a follow up here.
In addition, Webmaster World has a thread on the impact of the 301's updating in Google, Yahoo and others as well as tips and a little more advice. Take a look at that here.
Lastly here is a article from Google Groups on Dynamic vs. Static URL's that touches on changing structure and how it maps.
I was hoping I would have more information for you and a way to use the robots.txt file to help keep the rankings up when you start the migration. I'll keep looking and see what I can find for you. Cheers and good luck!

301's should in fact cover it.
Searchengines are generally pretty good at this :)

Related

static site pagination with google app engine

I have an octopress/jekyll blog that I am trying to host with Google App Engine. Here's a different SO question that got me started: How to Regex for static webpage on Google App Engine?
However, I would ALSO like to get pretty urls working (with no index.html required at the end of the url).
e.g. /blog/post/ instead of /blog/post/index.html
For some of this I can wire up explicit rules (though it's pretty ugly). For the pagination pages (/blog/page2 for example) there is no way to know how many there will be and therefore can't wire them up.
I dimly suspect this will require a python script, but I'm wondering if there might be some regex magic that would accomplish the same thing? Either way, anyone have an idea? An example of a script that might work?

ember hash urls in google

I am concerned about page ranking on google with the following situation:
I am looking to convert my existing site with 150k+ unique page results to a ember app, off the route. so currently its something like domain.com/model/id - With ember and hash change - it will be /#/model/id. I really want history state but lack of IE support doesn't leave that as a option. So my Sitemap for google has lots and lots of great results using the old model/id. On the rails side I will test browser for compatibility, before either rendering the JS rich app or the plain HTML / CSS. Does anyone have good SEO suggestions with my current schema for success.
Linked below is my schema and looking at the options -
http://static.allplaces.net/images/EmberTF.pdf
History state is awesome but it looks like support is only around 60% of browsers.
http://caniuse.com/history
Thanks guys for the suggestions, the google guide is similar to what I'm going to try. I will roll it out to 1 client this month, and see what webmasters and analytics show.
here is everything you need to have your hash links be seo friendly: https://developers.google.com/webmasters/ajax-crawling/
basically You write Your whole app with hashlinks, but You have to add "!" to them, so You have #!/model/id. Next You must have all pages somewhere generated and if google asks for them, return "plain html" as described here: https://developers.google.com/webmasters/ajax-crawling/docs/getting-started
use google webmaster tools to check if Your site is crawlable.
I'm not sure if you're aware that you can configure Ember to use the browser history for the location API and keep using your pages the way they are reference now. All you need to do is configure the Route's location property
App.Router.reopen({
location: 'history'
});
See more details about specifying the location api here

broken links and images in search engine cached pages only

I got a weird problem, my site shows up perfectly in all browsers but when checking cached pages of site in Google, Bing or even Yahoo, all of them shows broken links and images because some links are overridden such as
Let's say direct url is http://www.expatads.com/47-Thailand/ and it shows perfect.
Here's google cache of the same url.
http://webcache.googleusercontent.com/search?q=cache:qFAzM4VMsJsJ:www.expatads.com/47-Thailand/+&cd=1&hl=en&ct=clnk
What I want to know is the best way to reproduce such errors that are visible instead of waiting for search engines to cache and show pages. Since web browsers do not show any error but actually there is path error that cause that.
I'll appreciate if anyone can give me a way that I can reproduce these errors using some browser, software or whatever?
I believe you are mistaken about this being an error. If you take a look at the screenshot of Google's search result for your page, the images are shown.
It appears that Google's cache does not rewrite relative URLs, which makes some sense because it wouldn't always work and some sites might not allow hotlinking, etc. So, all the the resources linked to on your page using relative links won't show up in Google's cached version.
If you would rather see what your site looks like in other browsers you may want to try Browsershots. This will give you screenshots from a huge number of browsers in order to test compatibility.

Tracking User Actions on Landing Pages in Django

I'm developing a web application. It's months away from completion but I would like to build a landing page to show to potential customers to explain things and gauge their interest--basically collecting their email address and if they feel like it additional information like names + addresses.
Because I'm already using Django to build my site I thought I might use another Django App to serve as this landing page. The features I need are
to display a fairly static page and potentially a series of pages,
collect emails (and additional customer data)
track their actions--e.g., they got through the first two pages but didnt fill out the final page.
Is there any pre-existing Django app that provides any of these features?
If there is not a Django app, then does anyone know of another, faster/better way than building my own app? Perhaps a pre-existing web service that you can skin and make look like your own? Maybe there's the perfect system but it's PHP?--I'm open for whatever.
Option 1: Google Sites
You can set it up very very quickly. Though your monitoring wouldn't be as detailed as you're asking for.. Still, easy and fasssst!
Option 2: bbclone
Something else that may be helpful is to set up some PHP based site (wordpress or something) and use bbclone for tracking stuff on it. I've found bbclone to be pretty intense with the reporting what everyone does - though it's been a while since I used it.
Option 3: Django Flatpages
The flatpages Django contrib app is pretty handy for making static flat pages. I'd probably just embed a Google Docs Form to collect email addresses (as that's super fast and lets you get back to real work). But this suggestion would still leave you needing to figure out how to get the level of detail you want on the stats end.
Perhaps consider Google Analytics anyway?
Regardless, I suggest you use Google Analytics with everything. That'll work with anything you do really, and for all I know, perhaps you can find a way to get the stats you're really looking for out of it.

Is it possible to be attacked with XSS on a static page (i.e. without PHP)?

A client I'm working for has mysteriously ended up with some malicious scripting going on on their site. I'm a little baffled however because the site is static and not dynamically generated - no PHP, Rails, etc. At the bottom of the page though, somebody opened a new tag and a script. When I opened the file on the webserver and stripped the malicious stuff and re-uploaded, it was still there. How is this possible? And more importantly, how can I combat this?
EDIT:
To make it weirder, I just noticed the script only shows up in the source if the page is accessed directly as 'domain.com/index.html' but not as just 'domain.com'.
EDIT2:
At any rate, I found some php file (x76x09.php) sitting on the web server that must have been updating the html file despite my attempts to strip it of the script. I'm currently in the clear but I do have to do some work to make sure rogue files don't just appear again and cause problems. If anyone has any suggestions on this feel free to leave a comment, otherwise thanks for the help everyone! It was very much appreciated!
No it's not possible unless someone has access to your files. So in your case someone has access to your files.
Edit: It's best if you ask in serverfault.com regarding what to do in case the server is compromised, but:
change your shell passwords
have a look at /var/log/messages for login attempts
finger root
have a look at last modification time of those files
There is also a high propability that the files where altered via http by using a vulnerability of a software component you use together with the static files.
To the point about the site not having pages executing on the server, XSS is absolutely still possible using a DOM based attack. Usually this will relate to JavaScript execution outputting content to the page. Just last week WhiteHat Security had an XSS vulnerability identified on a purely “static” page.
It may well be that the attack vector relates to file level access but I suggest it’s also worthwhile taking a look at what’s going on JS wise.
You should probably talk to your hosting company about this. Also, check that your file permissions aren't more lenient than they should be for your particular environment.
That's happened to me before - this happens if they get your ftp details. So, whoever did it, obviously got ahold of your ftp details somehow.
Best thing to do is change your password and contact your webhosting company to figure out a better solution.
Unfortunately, FTP isn't the most secure...