Edit robots.txt from admin panel in OpenCart - opencart

I want to edit my robots.txt file in OpenCart 2.* from admin panel. Does anybody know how to do it?
Is it a standard feature or I need to install any extensions?
List of modules or code examples will be perfectly.

Hi OpenCart does not have robots.txt file by default. There is no setting in admin panel too.
However this is very simple, you can simply create robots.txt file in your website root directory. A brief example of syntax is given below.
Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit.
It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site (i.e. it is not a firewall, or a kind of password protection) and the fact that you put a robots.txt file is something like putting a note “Please, do not enter” on an unlocked door – e.g. you cannot prevent thieves from coming in but the good guys will not open to door and enter.
Block all web crawlers from all content
User-agent: *
Disallow: /
Block a specific web crawler from a specific folder
User-agent: Googlebot
Disallow: /no-google/
Block a specific web crawler from a specific web page
User-agent: Googlebot
Disallow: /no-google/blocked-page.html
Sitemap Parameter
User-agent: *
Disallow:
Sitemap: http://www.example.com/none-standard-location/sitemap.xml
Hope it helps.

You have to write module that can edit file. So look on file_get_contents and file_put_contents

Related

Decoupled CMS, Selective crawling of the db server

We are running a decoupled CMS (using Wordpress as our db) and want to prevent search engines from crawling our posts from this server. We have post templates on that server so writers can preview their posts and Google has found them.
Am I able to detect a that a crawler is trying to access these pages in my .htaccess file and redirect to the www server? Is redirecting the wrong solution here? Can robots.txt block a generic pattern such as category/post-title?
Three things need to still happen:
We need to still be able to access db.site.com/wp-admin
Writers still need to preview their posts, which means they cannot be redirected.
db.site.com/wp-content/uploads needs to be accessible so Social sites can pull
images.
Here are how the posts are setup. Basically I want to block or redirect posts from db.site.com
db.site.com/category/post-title
www.site.com/category/post-title

Redirect module in Sitecore

I have installed Redirect module in sitecore. Inside modules I have created "Redirect Url". In Redirect Url I wrote Requested Url "http://domainname/pagename" and selected Redirect To from content. But it is not working. Can anyone tell me what is wrong I am doing?
I have created redirect pattern.
It all depends on which module implementation you are using. I have heard multiple complaints on functional of original one (seems it is discontinued at all), so people are doing their own forks. The best implementation for the day is by Chris Adams and Max Slabyak, the module with sources, packages as well as good documentation is available at GitHub and it is being maintained with time.
With that Redirect Module installed, I do the following:
Under /sitecore/system/Modules/Redirect Module folder in Sitecore create a new redirect pattern called Pagename Test
Set requested expression to ^/pagename/?
Leave response status code equal 301
Set source item to the actual page item serving that redirect request
Do not forget to publish redirect pattern (and module itself if not yet)
Then as I hit http://myhostname/pagename/ I am being redirected to desired page with 301 status code.
Hope this helps and please let us know if that worked out for you.

Robots.txt: disallow a folder's name, regardless at which depth it may show up

So I have to disallow search engines from indexing our REST web service responses (it's a Sitecore website); all of them have the same name in the URL but show up at different levels in the server hierarchy, and I was wondering if I can write a "catch all" entry in our robots file or if I am doomed to write an extensive list.
Can I add something like
Disallow: */ajax/*
to catch all folders named "ajax" regardless of where they appear?
robots.txt specification doesn't say anything about wildcards but Google (Google Robots.txt Specifications) and Bing allow the use of wildcards in robots.txt files.
Disallow: */ajax/*
Your disallow is valid for all the /ajax/ urls no matter what is the nesting level of /ajax/.
You should be able to just use Disallow: /*ajax. Similar question over here:
How to disallow service api and multilingual urls in robots.txt

Is it possible to list sitemaps for different domains in the same robots.txt file?

We have multiple websites served from the same Sitecore instance and same production web server. Each website has its own primary and Google-news sitemap, and up to now we have included a sitemap specification for each in the .NET site's single robots.txt file.
Our SEO expert has raised the presence of different domains in the same robots.txt as a possible issue, and I can't find any documentation definitely stating one way or the other. Thank you.
This should be OK for Google at least. It may not work for other search engines such as Bing, however.
According to https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt:
sitemap: [absoluteURL]
[absoluteURL] points to a Sitemap, Sitemap Index file or equivalent URL. The URL does not have to be on the same host as the robots.txt file. Multiple sitemap entries may exist. As non-group-member records, these are not tied to any specific user-agents and may be followed by all crawlers, provided it is not disallowed.
The best way to achieve this is to Handle the Robots.txt from Sitecore Content Tree.
We also have similar structure where we are delivering multiple websites from Single sitecore instance.
I have written a blog for such please find it below. It is exactly what you want.
http://darjimaulik.wordpress.com/2013/03/06/how-to-create-handler-in-sitecore/

How to use Wordpress and Django together

I host my Django site Wantbox.com on Dreamhost. I'd like to use Wordpress for the Wantbox blog and locate it here: http://wantbox.com/blog/
How do I configure Django to lay off "/blog/" so Wordpress can do it's thing? Right now, I have a catch-all url pattern which sends anything not specified to the homepage and this catch-all is catching "/blog/" and doing just that.
Thanks for your help!
UPDATE:
It's not necessary for Django data to be accessible by Wordpress or vica-versa. Also, I'm open to a Django-based blog solution, if it works as well as the tried-and-true Wordpress that I'm quite familiar with.
I found via Google a similar SO question here. The answer is to create an .htaccess file in the root of your new blog folder. In my case, the blog root directory is here: ~/wantbox.com/public/blog/
My .htaccess file in this directory has one line:
PassengerEnabled off
Now the url pattern http://wantbox.com/blog/ is ignored by Django and handled by Wordpress. Very nice.