URL routing process works on one web, not another. 100% processor usage - regex

I thought I have URL routing under control as it works well on one website but found out it is not working fine on another. Both websites run from same server, same IIS 6.0, same asp_isapi.dll.
SETUP 1: This one works well:
routes.MapPageRoute("Article SEO",
"sid/{sid}",
"~/ar.aspx",
true,
new RouteValueDictionary { },
new RouteValueDictionary { { "sid", #"^([a-zA-Z\-]*)+([a-zA-Z])+(.html)$" } }
);
SETUP 2: But this one, very similar is not working well:
routes.MapPageRoute("Article",
"page/{sid}",
"~/page.aspx",
true,
new RouteValueDictionary { },
new RouteValueDictionary { { "sid", #"^([a-zA-Z0-9\-]*)+([a-zA-Z0-9])+(.html)$" } }
);
Testing Regex in the Regex Coach shows that they are written correctly, I mean they both catch good or wrong strings.
URL I use for the second one is http://address/page/some-html-keywords.html. If I specify URL like this it works well.
Problem is if I change .html extension for something like .htmls or .anything it completely kills web server. I have 100% process usage. I dont understand why and how, I dont have this problem with first setup. I can change it for whatever I want and it either shows page because I have correct format or shows 404 page not found.
Some examples:
http://address/page - 404 page, working correctly
http://address/page/test.html - accepted, working correctly
http://address/page/testing_#.html - 404 page, working correctly
http://address/page/test.htmls - wont show page, hanging, 100% process usage, not working correctly
http://address/page/test.whatever - wont show page, hanging, 100% process usage, not working correctly
http://address/page/page.aspx - redirects, working correctly
The same setup (with different Regex check) works well on other website within same IIS 6.0. Both use same asp_isapi.dll file.
I just dont get it. I have tried to comment all code in page.aspx to find out if there was problem with the code within page.aspx but it doesnt matter. It simply hangs with empty page as well. So must be problem with URL routing or isapi.dll or IIS. But other website on same IIS and same machine simply works.
Any opinions?
Thank you
Fero

I don't know anything about URL routing
BUT I notice that the regular expression you specify
#"^([a-zA-Z0-9\-]*)+([a-zA-Z0-9])+(.html)$"
Looks to be the same in both code samples AND (again, in both examples) ends with a trailing $ (which means end-of-string), which will prohibit anything NOT ending in .html from being matched by that regular expression. To get .htmls you need (.html.*)$, to get .anything you need something like
#"^([a-zA-Z0-9\-]*)+([a-zA-Z0-9])+\.[a-zA-Z0-9]*$"
Also, it probably would be a good idea to esacpe the '.' just before html, like \.html, as reg expressions normally process '.' to mean any single character, which includes the '.' character.
I Hope this helps.

Learn how to analyze ASP.NET high CPU root cause, and then you will find out why,
http://blogs.msdn.com/b/tess/archive/2008/02/22/net-debugging-demos-lab-4-high-cpu-hang.aspx

Related

Regex specific question and search function on my website dealing with broken links

I've been trying to figure out my regex pattern but it doesn't seem to be working for me.
Here's what i'm trying to do:
I have broken links on my website if someone accidentally gets to a page like so:
https://example.com/catalogsearch/result/?q=
or
https://example.com/catalogsearch/result/
So i'm redirecting them back to my homepage. The problem is now the search is just sending everything back to the homepage. So i'm assuming if there is something after the equals it needs to continue the search.. obviously
https://example.com/catalogsearch/result/?q=person
but currently i can't figure this out..
Here is my regex that i've been messing with for quite sometime now... still seems to be wrong or something else is wrong with my search.
"^/catalogsearch/result((/)|(/\\?)|(/\\?[a-z])|(/\\?[a-z]=))?$"
Please forgive me i'm horrible with regex.
After a lot of discussion, it is concluded that the routes.yaml will consider the url path as a valid route but not the query string part. Hence out of the two examples in the post, you can use
"/catalogsearch/result": { to: "https://example.com/", prefix: false }
and for other one please change it in nginx config to redirect to homepage or if its not possible then check with magento support on how to incorporate the query string part in routes.yaml file.

Use RegEx to redirect using data from files

Recently, we restructured a large site of one of our customers. This caused all the news-articels on that site to be on a different place. Problem is that the google cache is still showing them on the old location, leading to A LOT of 404 not founds ( its about 1400 news entries ).
Normally, a redirect using somewhat simple regex would be fine, but not only the path to the news did change, but also some parameters. Example:
Old Url:
http://www.customers-url.com/old/path/to/the/news/details/?tx_ttnews%5Btt_news%5D=67&cHash=a782f3027c4d00face6df122a76c38ed
How the new url should look like:
http://www.customers-url.com/new/path/to/news/?tx_news_pi1%5Bnews%5D=65
As you can see, the parameter D did change from 67 to 65 and the part of the URL before the ? did also change. Also, tx_ttnews has changed to tx_news and tt_news changed to news and the &cHash part did fall away completely.
I have the changed ids in a csv in the following format:
old_id, new_id
1,2
2,3
...etc...
Same goes the the changed url part before the ?. Since im not exactly an expert in using regex my question is:
Can this be done inside the .htaccess using RegEx ( not sure if it can even use a file as input)? How complicated is that? And how would such a regular expression look like?
Rather than trying to use .htaccess, it would be easier to manage and easier to code if you simply make a new page that responds on the old url (/old/path/to/the/news/details), then make that page build the new url and return a 301 to the browser with that new url.

URL redirect plugin regex input for match and target

I'm panicking a little, so sorry if I haven't explained well enough.
I've dealt with quite the nightmare of a permalink restructuring experience
Old permalink= sitename/archives/postid
desired new= sitename/postname
tried everything it seems. I've even dabbled with /?p=$1 (<-----that nonsense!). But now i'm getting some crazy error when i go to my old permalink structure that reads:
Oops! Google Chrome could not connect to 0.0.37.89
Suggestions:
Try reloading: 0.­0.­37.­89
and this was supposed to be "redirected".
I give up. please help.
sitename= brightontheday.com
I used the redirection plugin to redirect all old URL permalinks (/archives/postID) to the new permalink (/postID/postname)
also, the issue appeared to be due to cashing via cloudfare. It's important to to note that one should put cloudfare in "developer mode" while making site wide changes.

Will youtube address with only code usually work?

In setting up a part of a site where users can enter url for youtube video and have it play in modal window -- currently using fancybox -- have found that some urls don't work, even with fancybox regex for the href.
However, it appears that just getting the video's code and appending it to a basic url stem works. Such as:
http://www.youtube.com/watch?v={video code}
as in http://www.youtube.com/watch?v=OMYzsAurxHY
I run the submitted url through a series of ColdFusion regexs to get the code, and then append to the "http://www.youtube.com/watch?v=" stem.
Note having any problems, but just want to see if anyone more familiar with youtube sees any problem with assuming this stem + video code wouldn't always work -- or at least a great percentage of the time.
Thanks

How do I get rid of the ? (question mark) in the URL that identifies the start of the query string?

I am using ColdFusion 9.1.2.
I set up a new web site that parses the query string after the domain name and slash. What is left is the MusicianID and then a string used to help with SEO. The URL looks like this:
http://awesomealbums.info/?1085/jim-croce
http://awesomealbums.info/?1077/james-taylor
When I share it using Facebook, Facebook removes the question mark and encodes it. They can't seem to parse it so they display it as the home page.
These throw an error that I can't control:
http://awesomealbums.info/1085/jim-croce
http://awesomealbums.info/1077/james-taylor
I notice that StackOverlfow and other sites are able to exclude the question mark that starts a query string. I would like to do the same. I, however, can't change any IIS or CF Administrator settings. I need to code the solution. I've tried, but I get IIS telling me they can't find the page.
I want my URLs to look like this:
http://awesomealbums.info/1085/jim-croce // same as above but no ?
http://awesomealbums.info/1077/james-taylor // same as above but no ?
Here's the code that I am using right now to parse the URL and get the MusicianID.
<cfscript>
QString = CGI.QUERY_STRING;
if (QString eq "") {
include "Home.cfm";
} else if (QString eq "WhoAmI") {
include "WhoAmI.cfm";
} else {
IndexOfSlash = Find("/", QString);
if (IndexOfSlash eq 5) {
ThisID = left(QString, 4);
if (isNumeric(ThisID)) {
MusicianID = ThisID;
include "Musician.cfm";
}
} else {
location(url="http://www.awesomealbums.info" addtoken="false");
}
}
How can I alter my site so that the question mark can be removed and the web server doesn't get funky and I can parse out the query string?
The keyword you are looking for is URL rewriting. It has to happen on the web server, since you want to handle all requests in the top-level directory. If your web server is the Apache httpd, you can do it like this:
RewriteEngine on
RewriteRule ^/(\d{4})/([\w-]+)$ /?$1/$2 [L]
or
RewriteRule ^/(\d{4})/([\w-]+)$ /Musician.cfm?MusicianID=$1 [L]
Since you can't modify the web server (as Roland correctly suggests) there is one alternative - use URLs that look like this:
http://awesomealbums.info/index.cfm/1085/jim-croce
Structured this way, the webserver (IIS) will still pass control to your script. Then you can start having CF take over control of the processing. Your CGI.query_string will be empty, but your cgi.path_info variable will contain /1085/jim-croce. You can then start parsing that and handling it as needed.
The answer for IIS is basically the same as Roland mentions in his answer. You need to use URL Rewriting, that is the only way to accomplish what you are looking to do. This is because technically the URL that you want to request, does not exist as a real page or resource on the server, and you need to use URL Rewriting to intercept the page request, map it using regular expressions, then pass it to your application as the query string (page parameters) that you are expecting. So, if you are doing this on a hosted server, contact your host and see if they have something setup or installed on the server for doing URL Rewriting. Most any decent host certainly will.
If you are using IIS7, then info on using the built in URL Rewriting can be found at the link:
http://learn.iis.net/page.aspx/460/using-the-url-rewrite-module/
If you are using an older version of IIS, then you need to install an application on the server that will do this as previous versions of IIS do not have built in support for regular expression based rewriting that you would need to properly map your URLs to the correct parameters on your query string. For older versions of IIS, I've used Helicon IsapiRewrite which you can find at:
http://www.isapirewrite.com/
riding off of jfeasel's answer, you can use ColdCourse:
http://coldcourse.riaforge.org/