Will youtube address with only code usually work? - regex

In setting up a part of a site where users can enter url for youtube video and have it play in modal window -- currently using fancybox -- have found that some urls don't work, even with fancybox regex for the href.
However, it appears that just getting the video's code and appending it to a basic url stem works. Such as:
http://www.youtube.com/watch?v={video code}
as in http://www.youtube.com/watch?v=OMYzsAurxHY
I run the submitted url through a series of ColdFusion regexs to get the code, and then append to the "http://www.youtube.com/watch?v=" stem.
Note having any problems, but just want to see if anyone more familiar with youtube sees any problem with assuming this stem + video code wouldn't always work -- or at least a great percentage of the time.
Thanks

Related

Regex specific question and search function on my website dealing with broken links

I've been trying to figure out my regex pattern but it doesn't seem to be working for me.
Here's what i'm trying to do:
I have broken links on my website if someone accidentally gets to a page like so:
https://example.com/catalogsearch/result/?q=
or
https://example.com/catalogsearch/result/
So i'm redirecting them back to my homepage. The problem is now the search is just sending everything back to the homepage. So i'm assuming if there is something after the equals it needs to continue the search.. obviously
https://example.com/catalogsearch/result/?q=person
but currently i can't figure this out..
Here is my regex that i've been messing with for quite sometime now... still seems to be wrong or something else is wrong with my search.
"^/catalogsearch/result((/)|(/\\?)|(/\\?[a-z])|(/\\?[a-z]=))?$"
Please forgive me i'm horrible with regex.
After a lot of discussion, it is concluded that the routes.yaml will consider the url path as a valid route but not the query string part. Hence out of the two examples in the post, you can use
"/catalogsearch/result": { to: "https://example.com/", prefix: false }
and for other one please change it in nginx config to redirect to homepage or if its not possible then check with magento support on how to incorporate the query string part in routes.yaml file.

content empty when using scrapy

Thanks for everyone in advance.
I encountered a problem when using Scrapy on Python 2.7.
The webpage I tried to crawl is a discussion board for Chinese stock market.
When I tried to get the first number "42177" just under the banner of this page (the number you see on that webpage may not be the number you see in the picture shown here, because it represents the number of times this article has been read and is updated realtime...), I always get an empty content. I am aware that this might be the dynamic content issue, but yet don't have a clue how to crawl it properly.
The code I used is:
item["read"] = info.xpath("div[#id='zwmbti']/div[#id='zwmbtilr']/span[#class='tc1']/text()").extract()
I think the xpath is set correctly and I have checked the return value of this response and it indeed told me that there is nothing under this directory. Results shown here:'read': [u'<div id="zwmbtilr"></div>']
If it has something, there should be something between <div id="zwmbtilr"> and </div>.
Really appreciated if you guys share any thoughts on this!
I just opened your link in Firefox with NoScript enabled. There nothing inside the <div #id='zwmbtilr'></div>. If I enable the javascripts, I can see the content you want. So, as you already new, it is a dynamic content issue.
Your first option is try to identify the request generated by javascript. If you can do that, you can send the same request from scrapy. If you can't do it, the next option is usually to use some package with javascript/browser emulation or someting like that. Something like ScrapyJS or Scrapy + Selenium.

How to ensure users only embed SoundCloud iFrame?

I am building a social networking website for musicians and I would like them to be able to enter the embed code provided by SoundCloud, so that they may have a sound clip on their posts.
However, I am unsure how I would sanitise the input, to ensure that it's only a SoundCloud iframe embed code that they enter. I want to avoid them pasting in embed code for say, YouTube or anything else for that matter.
An example embed code from SoundCloud looks like:
<iframe width="100%" height="166" scrolling="no" frameborder="no" src="https://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F85146642"></iframe>
I am using the HTML parser, jSoup to sanitise input.
The key fragment to this is the src content:
https://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F85146642
One possibility I thought of, was to extract the src parameters value and then rebuild the iframe myself, this way, only storing the URL and ensuring that any HTML output to the browser is that which I have created myself. Doing this may also allow me to run checks on the domain name etc.
I'm wondering what the best approach would be for this?
Appreciate any input you may have.
Thanks,
Michael.
PS - I am using Railo (ColdFusion server) and the Java jSoup library, but I guess the same principles would apply regardless of what language one would use.

URL redirect plugin regex input for match and target

I'm panicking a little, so sorry if I haven't explained well enough.
I've dealt with quite the nightmare of a permalink restructuring experience
Old permalink= sitename/archives/postid
desired new= sitename/postname
tried everything it seems. I've even dabbled with /?p=$1 (<-----that nonsense!). But now i'm getting some crazy error when i go to my old permalink structure that reads:
Oops! Google Chrome could not connect to 0.0.37.89
Suggestions:
Try reloading: 0.­0.­37.­89
and this was supposed to be "redirected".
I give up. please help.
sitename= brightontheday.com
I used the redirection plugin to redirect all old URL permalinks (/archives/postID) to the new permalink (/postID/postname)
also, the issue appeared to be due to cashing via cloudfare. It's important to to note that one should put cloudfare in "developer mode" while making site wide changes.

URL routing process works on one web, not another. 100% processor usage

I thought I have URL routing under control as it works well on one website but found out it is not working fine on another. Both websites run from same server, same IIS 6.0, same asp_isapi.dll.
SETUP 1: This one works well:
routes.MapPageRoute("Article SEO",
"sid/{sid}",
"~/ar.aspx",
true,
new RouteValueDictionary { },
new RouteValueDictionary { { "sid", #"^([a-zA-Z\-]*)+([a-zA-Z])+(.html)$" } }
);
SETUP 2: But this one, very similar is not working well:
routes.MapPageRoute("Article",
"page/{sid}",
"~/page.aspx",
true,
new RouteValueDictionary { },
new RouteValueDictionary { { "sid", #"^([a-zA-Z0-9\-]*)+([a-zA-Z0-9])+(.html)$" } }
);
Testing Regex in the Regex Coach shows that they are written correctly, I mean they both catch good or wrong strings.
URL I use for the second one is http://address/page/some-html-keywords.html. If I specify URL like this it works well.
Problem is if I change .html extension for something like .htmls or .anything it completely kills web server. I have 100% process usage. I dont understand why and how, I dont have this problem with first setup. I can change it for whatever I want and it either shows page because I have correct format or shows 404 page not found.
Some examples:
http://address/page - 404 page, working correctly
http://address/page/test.html - accepted, working correctly
http://address/page/testing_#.html - 404 page, working correctly
http://address/page/test.htmls - wont show page, hanging, 100% process usage, not working correctly
http://address/page/test.whatever - wont show page, hanging, 100% process usage, not working correctly
http://address/page/page.aspx - redirects, working correctly
The same setup (with different Regex check) works well on other website within same IIS 6.0. Both use same asp_isapi.dll file.
I just dont get it. I have tried to comment all code in page.aspx to find out if there was problem with the code within page.aspx but it doesnt matter. It simply hangs with empty page as well. So must be problem with URL routing or isapi.dll or IIS. But other website on same IIS and same machine simply works.
Any opinions?
Thank you
Fero
I don't know anything about URL routing
BUT I notice that the regular expression you specify
#"^([a-zA-Z0-9\-]*)+([a-zA-Z0-9])+(.html)$"
Looks to be the same in both code samples AND (again, in both examples) ends with a trailing $ (which means end-of-string), which will prohibit anything NOT ending in .html from being matched by that regular expression. To get .htmls you need (.html.*)$, to get .anything you need something like
#"^([a-zA-Z0-9\-]*)+([a-zA-Z0-9])+\.[a-zA-Z0-9]*$"
Also, it probably would be a good idea to esacpe the '.' just before html, like \.html, as reg expressions normally process '.' to mean any single character, which includes the '.' character.
I Hope this helps.
Learn how to analyze ASP.NET high CPU root cause, and then you will find out why,
http://blogs.msdn.com/b/tess/archive/2008/02/22/net-debugging-demos-lab-4-high-cpu-hang.aspx