How do I get rid of the ? (question mark) in the URL that identifies the start of the query string? - coldfusion

I am using ColdFusion 9.1.2.
I set up a new web site that parses the query string after the domain name and slash. What is left is the MusicianID and then a string used to help with SEO. The URL looks like this:
http://awesomealbums.info/?1085/jim-croce
http://awesomealbums.info/?1077/james-taylor
When I share it using Facebook, Facebook removes the question mark and encodes it. They can't seem to parse it so they display it as the home page.
These throw an error that I can't control:
http://awesomealbums.info/1085/jim-croce
http://awesomealbums.info/1077/james-taylor
I notice that StackOverlfow and other sites are able to exclude the question mark that starts a query string. I would like to do the same. I, however, can't change any IIS or CF Administrator settings. I need to code the solution. I've tried, but I get IIS telling me they can't find the page.
I want my URLs to look like this:
http://awesomealbums.info/1085/jim-croce // same as above but no ?
http://awesomealbums.info/1077/james-taylor // same as above but no ?
Here's the code that I am using right now to parse the URL and get the MusicianID.
<cfscript>
QString = CGI.QUERY_STRING;
if (QString eq "") {
include "Home.cfm";
} else if (QString eq "WhoAmI") {
include "WhoAmI.cfm";
} else {
IndexOfSlash = Find("/", QString);
if (IndexOfSlash eq 5) {
ThisID = left(QString, 4);
if (isNumeric(ThisID)) {
MusicianID = ThisID;
include "Musician.cfm";
}
} else {
location(url="http://www.awesomealbums.info" addtoken="false");
}
}
How can I alter my site so that the question mark can be removed and the web server doesn't get funky and I can parse out the query string?

The keyword you are looking for is URL rewriting. It has to happen on the web server, since you want to handle all requests in the top-level directory. If your web server is the Apache httpd, you can do it like this:
RewriteEngine on
RewriteRule ^/(\d{4})/([\w-]+)$ /?$1/$2 [L]
or
RewriteRule ^/(\d{4})/([\w-]+)$ /Musician.cfm?MusicianID=$1 [L]

Since you can't modify the web server (as Roland correctly suggests) there is one alternative - use URLs that look like this:
http://awesomealbums.info/index.cfm/1085/jim-croce
Structured this way, the webserver (IIS) will still pass control to your script. Then you can start having CF take over control of the processing. Your CGI.query_string will be empty, but your cgi.path_info variable will contain /1085/jim-croce. You can then start parsing that and handling it as needed.

The answer for IIS is basically the same as Roland mentions in his answer. You need to use URL Rewriting, that is the only way to accomplish what you are looking to do. This is because technically the URL that you want to request, does not exist as a real page or resource on the server, and you need to use URL Rewriting to intercept the page request, map it using regular expressions, then pass it to your application as the query string (page parameters) that you are expecting. So, if you are doing this on a hosted server, contact your host and see if they have something setup or installed on the server for doing URL Rewriting. Most any decent host certainly will.
If you are using IIS7, then info on using the built in URL Rewriting can be found at the link:
http://learn.iis.net/page.aspx/460/using-the-url-rewrite-module/
If you are using an older version of IIS, then you need to install an application on the server that will do this as previous versions of IIS do not have built in support for regular expression based rewriting that you would need to properly map your URLs to the correct parameters on your query string. For older versions of IIS, I've used Helicon IsapiRewrite which you can find at:
http://www.isapirewrite.com/

riding off of jfeasel's answer, you can use ColdCourse:
http://coldcourse.riaforge.org/

Related

Replacing part of ${url}'s from a sitemap in Jmeter

I have a jmeter test plan that goes to a site's sitemap.xml page, retrieves each url on that page with an XPath Extractor, then passes ${url} to a HTTP Request sampler within a ForEach Controller to send the results for each page to a file. This works great, except I just realized that the links on this sitemap.xml page are hardcoded. This is a problem when i want to test https://staging-website.com, but all of the links on sitemap.xml are all www.website.com pages. It seems like there must be a way to replace 'www.website.com' in each ${url} with 'staging-website.com' with regex or something, but I haven't been able to figure out how. Any suggestions would be greatly appreciated.
Add a BeanShell pre-processor to manipulate the url.
String sUrl = vars.get("url");
String sNewUrl = sUrl.replace("www.website.com", "https://staging-website.com");
log.info("sNewUrl:" + sNewUrl);
vars.put("url", sNewUrl);
You can also try to correlate the sitemap.xml with the regular expression extractor positioned till www.website.com so that you extract only the URL portion of the data instead of the full host name. Shouldn't you be having it already since the HTTPSampler only allows you to enter the URI segment and not the host name?
You can use __strReplace() function available via JMeter Plugins project like:
${__strReplace(${url},${url},staging-website.com,)}
Demo:
The easiest way to install JMeter Custom Functions (as well as any other plugins) is using JMeter Plugins Manager
I was able to replace the host within the string by putting
${__javaScript('${url}'.replace('www.website'\,'staging.website'))}
in the path input of the second http request sampler. The answers provided by Selva and Dimitri were more elegant, so if I have time in the future to come back to this I will give them another try. I really appreciate the help!

Alternative to <!--#include virtual="somefilename"-->

I have a website running an an old apache server with SSI enabled. My host wants to move to a new server which has SSI disabled for security reasons.
I have a whole lot of pages with Google Friendly urls which just have one line
<!--#include virtual="Url_Including_Search_String"-->
What is the best alternative to the SSI to keep my google friendly search strings returning the specified search result?
I can achieve most of the results with rewrite rules in the .htaccess file, however some search strings have a space in the keyword but the url doesn't. I can't do this with a rewrite rule
ie www.somedomain.com.au/SYDNEY.htm would have
<!--#include virtual="/search.php?keyword=SYDNEY&Submit=SEARCH"-->
However,the issue is
www.somedomain.com.au/POTTSPOINT.htm would have
<!--#include virtual="/search.php?keyword=POTTS+POINT&Submit=SEARCH"-->
A rewrite rule cannot detect where a space should be in a Suburb name, so hoping there is an alternative for <!--#include virtual=
I have looked at RewriteMap but don't think I can access the file I would need to put this in.
I would use Mod Rewrite to redirect any calls to non-existent files to your Search page.
For example:
http://example.com/SYDNEY redirects to
http://example.com/search.php?q=SYDNEY
(assuming there is not actually a /SYDNEY/ file at your server root.)
Then get rid of all those individual redirect pages.
As for the spaces, I'd modify my actual Search page to recognize (for example) "POTTSPOINT" and figure out that the space should be inserted. Basically compare the search term against a database of substitutions.

Use RegEx to redirect using data from files

Recently, we restructured a large site of one of our customers. This caused all the news-articels on that site to be on a different place. Problem is that the google cache is still showing them on the old location, leading to A LOT of 404 not founds ( its about 1400 news entries ).
Normally, a redirect using somewhat simple regex would be fine, but not only the path to the news did change, but also some parameters. Example:
Old Url:
http://www.customers-url.com/old/path/to/the/news/details/?tx_ttnews%5Btt_news%5D=67&cHash=a782f3027c4d00face6df122a76c38ed
How the new url should look like:
http://www.customers-url.com/new/path/to/news/?tx_news_pi1%5Bnews%5D=65
As you can see, the parameter D did change from 67 to 65 and the part of the URL before the ? did also change. Also, tx_ttnews has changed to tx_news and tt_news changed to news and the &cHash part did fall away completely.
I have the changed ids in a csv in the following format:
old_id, new_id
1,2
2,3
...etc...
Same goes the the changed url part before the ?. Since im not exactly an expert in using regex my question is:
Can this be done inside the .htaccess using RegEx ( not sure if it can even use a file as input)? How complicated is that? And how would such a regular expression look like?
Rather than trying to use .htaccess, it would be easier to manage and easier to code if you simply make a new page that responds on the old url (/old/path/to/the/news/details), then make that page build the new url and return a 301 to the browser with that new url.

Must one use an MVC FrameW to produce clean URL's?

The host that I want to host with does not support server side url rewriting, thus no third party tools can be installed to rewrite the url's.
This is Coldfusion 8, on windows, IIS.
The other alternative that I know of is to use a framework, but I do not feel like taking that route (time), for the application works well as it is (but the URL).
Can clean urls be generated by purely CF?
I do not need the clean url's for seo, rather it will be for the user's easy reference to their page. E.g. youtube.com/userpage
Any sugessions?
If the only choice is to use a framework, then which one is most compatible with traditional cfml'', cfm's & CFC's? In that there needs to be minimum changes to the code in the conversion from the none frameworked app to become frameworked.
Thanks for you contributions!
No. you do not need a framework or URL rewriter to get http://domain.com/some/url to work (notice no index.cfm).
In IIS you can set up custom error pages for 404 errors. Make the custom error page execute a ColdFusion page on your server (/urlhandler.cfm, 404.cfm or index.cfm for example). Within that page, you can control your own routes with ColdFusion by using list methods on the cgi.query_string value. IIS will provide you a url that looks something like 404;http://domain.com/the/original/url which you can parse to route the visitor to your desired event.
<!--- Get URL String --->
<cfset CurrentURL = ListGetAt(cgi.query_string, 2, ";")>
<cfset CurrentURL = Replace(CurrentURL, ":80", "")>
<cfset CurrentURL = Replace(CurrentURL, ":443", "")>
<cfset CurrentURL = Replace(CurrentURL, "403;", "")>
<cfset CurrentURL = Replace(CurrentURL, "'", "", "ALL")>
We have a site that receives approx a million visitors a month that is still running SES urls with this method. I was shocked when I was hired and found this existing code at the heart of the site and would not elect to repeat it, but, if you have limitations on installing a rewriter or third party framework (this client placed restrictions on the site) this solution may work for you.
By playing with the above code, you can quickly see how you may use CF to dynamically include the .CFM file you want or execute the right CFC code depending on your set up.
You can use the Framework/1 framework or CFWheels to achieve clean URL's but it will need to include the "/index.cfm/" at the beginning of the URL in order to trigger ColdFusion's application handler code.
EDIT: Please see Aaron Greenlee's work around to prevent the "index.cfm" from appearing in the URL.
i.e. Whichever approach you take, if you cannot add a 3rd party tool to rewrite URLs (and not using Apache), your URL's will be in the form of http://site.com/index.cfm/section/item
eg.
http://site.com/index.cfm/user/login
http://site.cfm/index.cfm/user/signup
FW/1 offers the option of passing in URL variables in a search engine friendly format as well.
Examples:
http://site.com/index.cfm/user/login/email/me#you.com/password/test
is the same as
http://site.com/index.cfm/user/login?email=me#you.com&password=test
is the same as
http://site.com/index.cfm?action=user.login&email=me#you.com&password=test
Go ahead and learn a framework. Most will work for this. However if you just do not want to learn a framework.
www.mysite.com/products/
will run:
www.mysite.com/products/index.cfm
www.mysite.com/products/books
will run:
www.mysite.com/products/books/index.cfm
Framework/1 and CFZen will work for this a they are very simple 1 file frameworks that you can just work around.
CFZen
http://cfzen.riaforge.org
http://cftipsplus.com/blog/?tag=cfzen
Framework/1
http://fw1.riaforge.org
http://corfield.org - the author of Framework/1
Instead of using http://yoursite.com/index.cfm/user, you can do http://yoursite.com/user.cfm and catch the error in the OnMissingTemplate function of application.cfc. This will not require you to set a custom 404 page.

With Coldfusion, how do you handle dynamicaly generated URLs?

(Update: I converted this question to a community wiki as the answer appears more subjective than I thought it would. There are multiple answers depending on one's needs.)
If I have a folder that only includes application.cfc and index.cfm, what is a fast, reliable method to handle dynamically generated URLs? i.e. URLs that do not have a corresponding physical .cfm file.
This example url generates a 404, but it should lookup a page in a db and return it via index.cfm:
http://www.myserver.com/cfdemo/mynewpage.cfm
Should I use onMissingTemplate() in the application.cfc to handle the missing file? Since this method doesn't process onRequestStart(), onRequest() and onRequestEnd(), I wonder if it should be avoided.
Alternately, I could setup an ISAPIRewrite rule since I'm using IIS (or mod_rewrite on Apache)
# IF the request is not /index.cfm, doesn't exist and ends in cfm or html,
# rewrite it. Pass the requested filename $1.$2 as the 1st param: cgi.page
# append the remaining url params $4 ($3 is the ?)
RewriteCond %{SCRIPT_NAME} ^(?!/index.cfm)(.*)$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^\/(.*)\.(cfm|html)(\??)(.*)$ /index.cfm?page=$1.$2&$4 [I,L]
Are these methods appropriate, or am I missing a better way of accomplishing this goal? It seems that Coldfusion should have this type of feature built into the application.cfc. Maybe I'm just missing it.
nothing wrong with url rewrite on web server level. I'd vote for that.
Because CF by default handles only cfm/cfc requests, you can do in the beginning of Application.cfc something like this:
<cfif Right(cgi.SCRIPT_NAME, 9) NEQ "index.cfm">
<!--- analyze the SCRIPT_NAME and start processing --->
</cfif>
For other filetypes using web-server configuration is the only way I can see. But instead of creating rewriting rules you can try to use custom 404 handlers. At least when using IIS you'll be able to get the context in cgi.QUERY_STRING, if set up the dummy page, say 404.cfm (it does not need to exist) and putting following check before previous example:
<!--- trap 404 requests triggered by IIS --->
<cfif right(cgi.SCRIPT_NAME, 7) EQ "404.cfm">
<cflog file="mylogfile" text="404 error triggered by IIS. Context: #cgi.QUERY_STRING#">
</cfif>
For Apache it is possible to use following handler, but I'm not sure if you can extract the context in this case:
ErrorDocument 404 /404.cfm
If you are doing this for SES URLs, I'd offer two pieces of advice.
The first is that they matter less and less as time goes on. Google, for example, recognizes that URLs need to include query data.
Second: CF can natively handle SES URLs in the form hostname/file.cfm/param1/param2. Ray Camden's BlogCFC, for example, works that way. It is on by default in CF8, but needs to be enabled in CF7. I don't have a lot of information handy on this, but it should be easy to Google (or Bing, or whatever).
If you can allow it, I'd try to convert URLs like:
http://www.myserver.com/cfdemo/mynewpage.cfm
to:
http://www.myserver.com/cfdemo/mynewpage OR
http://www.myserver.com/index.cfm/cfdemo/mynewpage
so that you don't lose the onRequest methods. The first one can be done only at the webserver level, so in Apache or IIS. The second one can be done in just ColdFusion. See this: http://www.cfcdeveloper.com/index.cfm/2007/4/7/Coldfusion-SES-URL.
Otherwise, if you must have the .cfm at the end, you can use a URL rewrite package in Apache or IIS to strip it out and then forward the request to a cfm page or do what you're doing with onMissingTemplate. I'd try to opt for a solution that doesn't involve losing the onRequest methods, but up to you.
I'd definitely go for URL rewriting. Not only will it be a more predictable, yet generalized approach, but it reduces a significant amount of string parsing load from the CF server. Further, it results in CF handling a request to a real file thereby getting you the benefit of onapplicationstart, onrequeststart, and other events.
As an aside, I've personally always found URLs like /index.cfm/foo/bar/ to look unpro and hackish. Additionally, URLs (like /foo/bar) that don't end in either a file extension or trailing slash are technically incorrect (per old-school static site conventions at the very least) and ought to probably be avoided as well. I'd also be curious where Ben Doom gets his assertion that "The first is that they matter less and less as time goes on. Google, for example, recognizes that URLs need to include query data." In my experience I've actually found the exact opposite to be true.