How to ensure users only embed SoundCloud iFrame? - coldfusion

I am building a social networking website for musicians and I would like them to be able to enter the embed code provided by SoundCloud, so that they may have a sound clip on their posts.
However, I am unsure how I would sanitise the input, to ensure that it's only a SoundCloud iframe embed code that they enter. I want to avoid them pasting in embed code for say, YouTube or anything else for that matter.
An example embed code from SoundCloud looks like:
<iframe width="100%" height="166" scrolling="no" frameborder="no" src="https://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F85146642"></iframe>
I am using the HTML parser, jSoup to sanitise input.
The key fragment to this is the src content:
https://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F85146642
One possibility I thought of, was to extract the src parameters value and then rebuild the iframe myself, this way, only storing the URL and ensuring that any HTML output to the browser is that which I have created myself. Doing this may also allow me to run checks on the domain name etc.
I'm wondering what the best approach would be for this?
Appreciate any input you may have.
Thanks,
Michael.
PS - I am using Railo (ColdFusion server) and the Java jSoup library, but I guess the same principles would apply regardless of what language one would use.

Related

XSS DOM vulnerable

I tested site for vulnerables (folder /service-contact) and possible XSS DOM issue came up (using Kali Linux, Vega and XSSER). However, i tried to manually test url with 'alert' script to make sure it's vulnerable. I used
www.babyland.nl/service-contact/alert("test")
No alert box/pop-up was shown, only the html code showed up in contact form box.
I am not sure i used the right code (i'm a rookie) or did the right interpretation. Server is Apache, using javascript/js.
Can you help?
Thanks!
This is Not Vulnerable to XSS, Whatever you are writing in the URL is Coming in Below Form section ( Vraag/opmerking ) . And the Double Quotes (") are Escaped. If you try another Payload like <script>alert(/xss/)</script> That Also won't work, Because this is Not Reflecting neither Storing. You will see output as a Text in Vraag/opmerking. Don't Rely on Online Scanners, Test Manually, For DOM Based XSS ..Check Sink and Sources and Analyze them.
The tool is right. There is a XSS-Vulnerability on the site, but the proof of concept (PoC) code is wrong. The content of a <textarea> can only contain character data (see <textarea> description on MDN). So your <script>alert("test")</script> is interpreted as text and not as HTML code. But you can close the <textarea> tag and insert the javascript code after that.
Here is the working PoC URL:
https://www.babyland.nl/service-contact/</textarea><script>alert("test")</script>
which is rendered as:
<textarea rows="" cols="" id="comment" name="comment"></textarea<script>alert("test")</script></textarea>
A little note to testing for XSS injection: Chrome/Chromium has a XSS protection. So this code doesn't exploit in this browser. For manual testing you can use Firefox or run Chrome with: --disable-web-security (see this StackOverflow Question and this for more information).

How can I use a regex to validate slideshare slideshow URLs?

I am using www.slideshare.net to allow my users to display embedded slideshows on their profiles.
I'm using slideshare's api to get the slideshow's id, given the slideshow link that users has to get by clicking 'share' on the slideshow and copy/paste the url:
What I would need is to validate thoroughly the latter url.
Just to further explain my process, when I have the slideshow's id, I compute the embedded code like so :
"<iframe src='https://www.slideshare.net/slideshow/embed_code/" + json.slideshow_id + "' frameborder='0' allowfullscreen webkitallowfullscreen mozillaallowfullscreen></iframe>"
where json is the object returned by slideshare's api.
A basic regex to answer my question would be:
^http\://www\.slideshare\.net/[a-zA-Z0-9\-]+/[a-zA-Z0-9\-]+$
But it feels a little weak to me :
I don't want my users to just copy/paste the url in the navigator address bar
I'm not sure this regex works for all slideshare's slideshows as I'm not a slideshare specialist (does that even exist?)
Ideally I would like to exclude all other regular urls from www.slideshare.net that doesn't point to a slideshow.
EDIT 7/12/2014: rewrite
You can use something like this:
(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?
More example from this website

MVC - Strip unwanted text from rss feed

Ive got the following code in my RSS consumer (Vandelay Industries RemoteRSS) in my Orchard CMS implementation:
#using System.Xml.Linq
#{
var feed = Model.Feed as XElement;
}
<ul>
#foreach(var item in feed
.Element("channel")
.Elements("item")
.Take((int)Model.ItemsToDisplay))
{
<li>#T(item.Element("description").Value)</li>
}
</ul>
The rss feed Im using is from Pinterest, and this bundles the image, link, and a short description all inside the 'description' elements of the feed.
<description><a href="/pin/215609900882251703/"><img src="http://media-cache-ec2.pinterest.com/upload/88664686384961121_UIyVRN8A_b.jpg"></a>How to install Orchard CMS on IIS Server</description>
My issue is that I don't want the text bits, and I also need to prefix the 'href=' links with 'http://www.pinterest.com'.
I've managed to edit the original code with my newbie skills to the above,, which essentially displays the images as links which are only relative and thus pointing locally to my server. These images are also then followed by the short description.
So to summarise, I need a way to prefix all links with 'http://pinterest.com' and then to remove the fee text after the image/links.
Any pointers will be greatly appreciated, Thanks.
You should probably parse the description, with something like http://htmlagilitypack.codeplex.com/, and then tweak it to add the prefix. Or you can learn regular expression and do without a library. Could be a little trickier and error-prone however.

Cleansing string / input in Coldfusion 9

I have been working with Coldfusion 9 lately (background in PHP primarily) and I am scratching my head trying to figure out how to 'clean/sanitize' input / string that is user submitted.
I want to make it HTMLSAFE, eliminate any javascript, or SQL query injection, the usual.
I am hoping I've overlooked some kind of function that already comes with CF9.
Can someone point me in the proper direction?
Well, for SQL injection, you want to use CFQUERYPARAM.
As for sanitizing the input for XSS and the like, you can use the ScriptProtect attribute in CFAPPLICATION, though I've heard that doesn't work flawlessly. You could look at Portcullis or similar 3rd-party CFCs for better script protection if you prefer.
This an addition to Kyle's suggestions not an alternative answer, but the comments panel is a bit rubbish for links.
Take a look a the ColdFusion string functions. You've got HTMLCodeFormat, HTMLEditFormat, JSStringFormat and URLEncodedFormat. All of which can help you with working with content posted from a form.
You can also try to use the regex functions to remove HTML tags, but its never a precise science. This ColdFusion based regex/html question should help there a bit.
You can also try to protect yourself from bots and known spammers using something like cfformprotect, which integrates Project Honeypot and Akismet protection amongst other tools into your forms.
You've got several options:
"Global Script Protection" Administrator setting, which applies a regular expression against post and get (i.e. FORM and URL) variables to strip out <script/>, <img/> and several other tags
Use isValid() to validate variables' data types (see my in depth answer on this one).
<cfqueryparam/>, which serves to create SQL bind parameters and validate the datatype passed to it.
That noted, if you are really trying to sanitize HTML, use Java, which ColdFusion can access natively. In particular use the OWASP AntiSamy Project, which takes an HTML fragment and whitelists what values can be part of it. This is the same approach that sites like SO and slashdot.org use to protect submissions and is a more secure approach to accepting markup content.
Sanitation of strings in coldfusion and in quite any language is very important and depends on what you want to do with the string. most mitigations are for
saving content to database (e.g. <cfqueryparam ...>)
using content to show on next page (e.g. put url-parameter in link or show url-parameter in text)
saving files and using upload filenames and content
There is always a risk if you follow the idea to prevent and reduce a string by allow basically everything in the first step and then sanitize malicious code "away" by deleting or replacing characters (blacklist approach).
The better solution is to replace strings with rereplace(...) agains regular expressions that explicitly allow only the characters needed for the scenario you use it as an easy solution, whenever this is possible. use cases are inputs for numbers, lists, email-addresses, urls, names, zip, cities, etc.
For example if you want to ask for a email-address, you could use
<cfif reFindNoCase("^[A-Z0-9._%+-]+#[A-Z0-9.-]+\.(?:[A-Z]{5})$", stringtosanitize)>...ok, clean...<cfelse>...not ok...</cfif>
(or an own regex).
For HTML-Imput or CSS-Imput I would also recommend OWASP Java HTML Sanitizer Project.

Multiple pages html output from a .rst document in Django

I'm writing a Django app to serve some documentation written in RestructuredText.
I have many documents written in *.rst, each of them is quite long with many section, subsection and so on.
Display the whole document in a single page is not a problem using Django filters, but I'd rather have just the topic index on a first page, whit links to an URL where I can display a single section / subsection (which will need some 'previous | up | home | next' link I guess...). In a way similar to a 'multiple HTML page output' as in a docbook / XML to HTML conversion.
Can anyone point me to some direction to build a document tree of a *.rst document an parse a single section of it, or suggest a clever way to obtain a similar result?
Choice 1. Include URL links to the other parts of the document.
You write an index.rst, part1.rst, part2.rst, etc. And your index.rst has links to the other parts. This requires almost no work, except careful planning to make sure that your RST HTML links are correct.
There's no "parse". You just break your document into sections. Manually.
[This seems so obvious, I'm afraid to mention it.]
Choice 2. Use Sphinx. It manages table-of-contents and inter-document connections very nicely.
However, the Sphinx extensions to RST aren't handled directly by Django, so you'd need to save the Sphinx output and then display that in Django. We use the JSON HTML Builder (http://sphinx.pocoo.org/builders.html?highlight=json#sphinx.builders.html.JSONHTMLBuilder) output from Sphinx. Then we render these documents through a template.