Error getting .htaccess to direct googlebot using _escaped_fragment_ - regex

I am trying to get my pages indexed on google using a prerendering service for my backbone app.
I know the setup works fine when I specifically add googlebot to the useragent list but Ive been advised against this in favor of using the _escaped_fragment_ method. Only problem is the _escaped_fragment_ parameter isn't getting passed correctly. Can some help please?
thanks!!!
# html5 pushstate (history) support:
<ifModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_HOST} ^example\.com$ [OR]
RewriteCond %{HTTPS} !on
RewriteRule ^(.*)$ https://www.example.com/$1 [R=301,L]
# If requested resource exists as a file or directory
# (REQUEST_FILENAME is only relative in virtualhost context, so not usable)
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
# Go to it as is
RewriteRule ^ - [L]
# If non existent
# If path ends with / and is not just a single /, redirect to without the trailing /
RewriteCond %{REQUEST_URI} ^.*/$
RewriteCond %{REQUEST_URI} !^/$
RewriteRule ^(.*)/$ $1 [R,QSA,L]
# Handle Prerender.io
RequestHeader set X-Prerender-Token "xxxxxxxx"
RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Proxy the request
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/https://www.example.com/$2 [P,L]
# If non existent
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !index
RewriteRule (.*) index.html [L,QSA]
</ifModule>
All the apache modules are loaded and working.

So the .htaccess is actually correct... here Google's official answer.
Quote from http://productforums.google.com/forum/#!category-topic/webmasters/crawling-indexing--ranking/bZgWCJTnl08%5B1-25%5D by John Mueller (google employee)
Looking at your blog's homepage, one thing to keep in mind is that the Fetch
as Googlebot feature does not parse the content that it fetches. So when you
submit toddmoyer.net/blog/ , it fetches that URL. After fetching the URL, it
doesn't parse it to check for the "fragment" meta tag, it just returns it to
you. However, if you fetch toddmoyer.net/blog/#! , then it should rewrite the
URL and fetch the URL toddmoyer.net/blog/?_escaped_fragment_= .
When we crawl and index your pages, we'll notice the meta-tag and act
accordingly. It's just the Fetch as Googlebot feature that doesn't check for
meta-tags, and instead just returns the raw content.

Related

mod_rewrite redirects with absolute path in URL

I am trying to use Apache mod_rewrite. The first thing I did was to rewrite my url to an index.php file which was working fine. But I thought I should remove the trailing slash(es) too because I would prefer this to be handled by Apache instead of my PHP router.
Here's the whole content of my .htaccess file:
RewriteEngine on
# one of the attempts to remove trailing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} (.*)/+$
RewriteRule ^(.*)/+$ $1 [R=301,L]
# This is the rewriting to my index.php (working)
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ index.php?/$1 [L]
The issue:
I read several questions about trailing slash removal but I could not find a working answer for me:
For every answer I tried, I was able to reach my PHP router index.php (located in Phunder\public\) without trailing slash:
// Requested URL | No redirection
http://localhost/projects/Phunder/public/home | http://localhost/projects/Phunder/public/home
But when requesting the same page with a trailing slash I get redirected with the absolute path included:
// Requested URL | Wrong redirection
http://localhost/projects/Phunder/public/home/ | http://localhost/C:/xampp/htdocs/projects/Phunder/public/home
Other informations:
I always clear my cache while testing
Changing my last RewriteRule to RewriteRule ^(.*)/?$ index.php?/$1 [L] results in a 404 Error with URL having a trailing slash.
The actual wrong redirection results in a 403 Error
I'm a beginner with mod_rewrite I'm not always understanding what I try (sadly). Is there something I missed or misused ? What should I do to get the expected behaviour ?
Redirect rules need either absolute URL or a RewriteBase. You can extract full URI from %{REQUEST_URI} as well like this:
RewriteEngine on
# one of the attempts to remove trailing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} ^(.+)/+$
RewriteRule ^ %1 [R=301,NE,L]
# This is the rewriting to my index.php (working)
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ index.php?/$1 [L,QSA]

Allowing specific files to run over https

I have SSL set up on the server, but I hardly use it for anything, therefore I am simply redirecting all https:// to http:// via .htaccess. Recently I have created small app for my Facebook Fan Page, but run into couple of problems, since Facebook does not allow http:// any longer.
Question:
How can I let one static HTML file to run with https:// leaving the rest of the set up as it is?
What I currently use in my .htaccess is the following
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{HTTPS} on [OR]
RewriteCond %{HTTP:X-Forwarded-Proto} https [OR]
RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^(.*)$ http://%{HTTP_HOST}%{REQUEST_URI} [L,R=301,NE]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-l
RewriteRule ^(.*)$ index.php/$1 [QSA,L,NC]
</IfModule>
Yes.
In your first block with all the conditions, add a condition so that your app will be excluded. For instance:
RewriteCond %{REQUEST_URI} !facebook-app\.html [OR]
For your_app_name, enter something specific that would only appear in the url for the App that you want to exclude from the rewrite. It does not matter if that name appears early or late in the url, just substitute it in the condition.

.htaccess redirection of index.php

The below illustrates the .htaccess code for a generating SEF url when joomla SEF is on in the backend.
## Begin - Joomla! core SEF Section.
#
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
#
# If the requested path and file is not /index.php and the request
# has not already been internally rewritten to the index.php script
RewriteCond %{REQUEST_URI} !^/index\.php
# and the request is for something within the component folder,
# or for the site root, or for an extensionless URL, or the
# requested URL ends with one of the listed extensions
RewriteCond %{REQUEST_URI} /component/|(/[^.]*|\.(php|html?|feed|pdf|vcf|raw))$ [NC]
# and the requested path and file doesn't directly match a physical file
RewriteCond %{REQUEST_FILENAME} !-f
# and the requested path and file doesn't directly match a physical folder
RewriteCond %{REQUEST_FILENAME} !-d
# internally rewrite the request to the index.php script
RewriteRule .* index.php [L]
I have a simple problem which I'm not able to get my head around into.
I put this condition so that if the url is domain.com/index.php then it should go to domain.com and this should happen only if the pattern matches exactly as index.php.
However this happens for all urls.
RewriteCond %{THE_REQUEST} [^.]*|/(index.php)$
RewriteRule ^index.php$ / [R=301,L]
it should go to domain.com and this should happen only if the pattern matches exactly as index.php. However this happens for all urls.
It is due to the wrong regex you're using. Use this rule to remove index.php from URIs:
RewriteCond %{REQUEST_URI} !/administrator [NC]
RewriteCond %{THE_REQUEST} /index\.php [NC]
RewriteRule ^(.*?)index\.php$ /$1 [L,R=302,NC,NE]

Rewrite rule with 2 variables with "/" separation

I have an .htaccess file located in a folder "/mixtapes/" I am trying to get the URL mydomain.com/music/downloads/mixtapes/this-title/id to execute mydomain.com/music/downloads/mixtapes/item.php?title=variable1&id=variable2
I currently have the below way somewhat working but it only uses the id and I need both variables (../mixtapes/title/id)separated by "/" and for some reason with the below code the index page inside "/mixtapes/" does not work.I am stumped! I am somewhat new to this and any help is greatly appreciated!
BTW on my index page the passing url to item.php page is rewritten to <a href="title/id">I just cant seem to get it to properly execute item.php?title=a&id=b with the format mixtapes/title/id
Current htaccess file located in "/mixtapes/"
# turn mod_rewrite engine on
RewriteEngine On
# rewrite all physical existing file or folder
RewriteCond %{REQUEST_FILENAME} !-f [OR]
RewriteCond %{REQUEST_FILENAME} !-d
# allow things that are certainly necessary
RewriteCond %{REQUEST_URI} "/css/" [OR]
RewriteCond %{REQUEST_URI} "/images/" [OR]
RewriteCond %{REQUEST_URI} "/images/" [OR]
RewriteCond %{REQUEST_URI} "/javascript/"
# rewrite rules
RewriteRule ^mixtapes/item.php(.*) - [L]
RewriteRule ^(.*) item.php?id=$1 [QSA]
The comments in your .htaccess actually state wrong
# turn mod_rewrite engine on
RewriteEngine On
# if requested URL is NOT a existing file
RewriteCond %{REQUEST_FILENAME} !-f [OR]
# or if requested URL is NOT a directory
RewriteCond %{REQUEST_FILENAME} !-d
# CSS, images, JavaScript are files and we will never pass this point when those are requested, next rules can be skipped
# rewrite rules
# rewrite /mixtapes/title/id to item.php?title=title&id=id
Rewrite ^mixtapes/([^/]+)/([0-9]+) item.php?title=$1&id=$2 [L]
# catch all other requests and handle them ( optional, default would be 404 if not a physically existing file )
Rewrite (.*) index.php [L,QSA]
I've assumed that your id is a numeric value.
Be aware with the use of the title in php. Don't output this directly but you can use it to verify your URL and redirect wrong title/id combos

htaccess: redirect old domain and all pages to a new domain

I know that there is a lot of examples on Stackoverflow but I still miss something.
I'm trying to redirect http://old.domain.com/fr/ to http://brand.new-domain.com/fr/ with the following rules, but that doesn't work:
# Enable Rewrite Engine
RewriteEngine On
RewriteBase /
# Add a trailing slash to paths without an extension
RewriteCond %{REQUEST_METHOD} !=POST
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !(\.[a-zA-Z0-9]{1,5}|/)$
RewriteRule ^(.*)$ $1/ [L,R=301]
# Redirect domain
Options +FollowSymLinks
RewriteCond %{HTTP_HOST} ^old.domain.com [OR]
RewriteCond %{HTTP_HOST} ^other-old.domain.com [NC]
RewriteRule ^(.*)$ http://brand.new-domain.com/$1 [r=301,L]
# Remove index.php
# Uses the "exclude method"
# http://expressionengine.com/wiki/Remove_index.php_From_URLs/#Exclude_List_Method
# This method seems to work best for us, you might also use the include method.
# http://expressionengine.com/wiki/Remove_index.php_From_URLs/#Include_List_Method
# Exclude root files
RewriteCond $1 !^(index\.php) [NC]
# Exclude EE folders
RewriteCond $1 !^(assets|ee-admin|images|templates|themes|fr|nl)/ [NC]
# Exclude user created folders
RewriteCond $1 !^(assets|css|img|js|swf|uploads)/ [NC]
# Exlude favico, robots, ipad icon
RewriteCond $1 !^(favicon\.ico|robots\.txt|pple-touch-icon\.png) [NC]
# Remove index.php
RewriteCond %{QUERY_STRING} !^(ACT=.*)$ [NC]
RewriteCond %{QUERY_STRING} !^(URL=.*)$ [NC]
RewriteRule ^(.*)$ /index.php?/$1 [L]
It correctly redirect when I call the root URL, but not when I call a page. What am I doing wrong?
Thanks in advance!
Pv
When writing mod_rewrite rules, the rules get applied in the order that they appear.
To redirect an old domain to a new domain, you'll want that rule to be first in your .htaccess or httpd.conf file — all other rules should appear after it.
If you only want to redirect a certain directory, the following rule will do so, while allowing the rest of the site to function normally:
<IfModule mod_rewrite.c>
RewriteEngine On
# Redirect Only Matching Directories
RewriteCond %{REQUEST_URI} ^/(fr|fr/.*)$
RewriteRule ^(.*)$ http://brand.new-domain.com/fr/$1 [R=301,L]
</IfModule>
If you want to redirect the entire site, the following rule will do so:
<IfModule mod_rewrite.c>
RewriteEngine On
# Redirect Entire Site to New Domain
RewriteCond %{HTTP_HOST} ^old.domain.com$ [OR]
RewriteCond %{HTTP_HOST} ^other-old.domain.com$ [NC]
RewriteRule ^(.*)$ http://brand.new-domain.com/$1 [R=301,L]
</IfModule>
If you care about letting crawlers know your content has moved and want to make the transition as seamless as possible, be sure to keep the 301 Redirect flag in the RewriteRule.
This will ensure that users and search engines are directed to the correct page.
While we're on the subject, as part of the EE 2.2 release, EllisLab now "officially" offers limited technical support for removing index.php from ExpressionEngine URLs.
Simply add or update your code to the following, making sure to consider any rules you may already have in place:
<IfModule mod_rewrite.c>
RewriteEngine On
# Removes index.php
RewriteCond $1 !\.(gif|jpe?g|png)$ [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /index.php/$1 [L]
# If 404s, "No Input File" or every URL returns the same thing
# make it /index.php?/$1 above (add the question mark)
</IfModule>
Try to use the following ruke as the first one:
# Redirect domain
Options +FollowSymLinks
RewriteCond %{HTTP_HOST} ^old.domain.com [OR]
RewriteCond %{HTTP_HOST} ^other-old.domain.com [NC]
RewriteRule ^(.*)$ http://brand.new-domain.com/$1 [R=301,L]
Also mind the upper case R with is the short form for the lower case redirect.
Have you tried using mod_alias simple redirect instructions (a core module that you have), before trying the hacky-mod-rewrite thing?
I would do a VirtualHost with ServerName old.domain.com and in this VH I would add this rule:
Redirect /fr http://brand.new-domain.com/fr
from doc:
Then any request beginning with URL-Path will return a redirect request to the client at the location of the target URL. Additional path information beyond the matched URL-Path will be appended to the target URL.
So get a separate VirtualHost for brand.new-domain.com (with ServerName brand.new-domain.com) and in this one do not set the Redirect Rule.
If you still want to handle the 2 domains in the same VirtualHost then you'll have to use mod-rewrite as even RedirectMatch cannot check the request domain on the query.