How can I use a regex to validate slideshare slideshow URLs? - regex

I am using www.slideshare.net to allow my users to display embedded slideshows on their profiles.
I'm using slideshare's api to get the slideshow's id, given the slideshow link that users has to get by clicking 'share' on the slideshow and copy/paste the url:
What I would need is to validate thoroughly the latter url.
Just to further explain my process, when I have the slideshow's id, I compute the embedded code like so :
"<iframe src='https://www.slideshare.net/slideshow/embed_code/" + json.slideshow_id + "' frameborder='0' allowfullscreen webkitallowfullscreen mozillaallowfullscreen></iframe>"
where json is the object returned by slideshare's api.
A basic regex to answer my question would be:
^http\://www\.slideshare\.net/[a-zA-Z0-9\-]+/[a-zA-Z0-9\-]+$
But it feels a little weak to me :
I don't want my users to just copy/paste the url in the navigator address bar
I'm not sure this regex works for all slideshare's slideshows as I'm not a slideshare specialist (does that even exist?)
Ideally I would like to exclude all other regular urls from www.slideshare.net that doesn't point to a slideshow.
EDIT 7/12/2014: rewrite

You can use something like this:
(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?
More example from this website

Related

How to generate Django template as MS word?

Is there a way to build a Django template or make it like Microsoft word by making the Django template have tools and the ability to write word documents from a website instead?
if we supposed that there is a domain name called: example.com
so, when I open that website for example.com/word-doc/ it will open a blank page and some tools from Microsoft Word to write in.
Is there any package or API to do so?
You are probably looking for a kind of "wysiwyg" editor for client side (this part has nothing do with Django). The closest thing I could found is this one, however it comes with a price tag.
In Django-side you'll (probably) simply store the data given by editor (client). The editor will then be able to parse it for update/read purposes.

Regex to replace spam links in Wordpress

I am dealing with old hacked sites in Wordpress where there are injection spam links on images.
I have access to the database and would like to remove links that look like this:
<a style="text-decoration:none" href="/ansaid-retail-cost">.</a>
Now text varies inside the <href> it might be for cialas or any product, but the rest doesn't vary. I want to remove the entire LINK, so the result is a single space.
I don't know regex, so I would appreciate the help. I've tried online generators but they don't seem to be working.

Django. How to make html(tinymce) input safe on form.clean?

For example, I have "Add comment" form on my django-powered website.
This form have text field with tinymce.
I want user to be able to use only p,strong,i,ul,ol,li tags. Because, result is html-code, I can't use strip_tags on my AddCommentForm.clean_text method. Also, I need to be sure, that result doesn't contain any vulnerabilities (js, iframe, etc)
I believe, that you can advice me a good solution for this))
This can be done at the TinyMCE side via some configuration parameters. While it's not 100% secure fro someone POSTing directly, it's better than nothing.
It should just be a matter of tweaking your valid_elements config in your TinyMCE setup to only allow what you want:
...
valid_elements : "p,strong/b,i/em,ul,ol,li",
...

How to ensure users only embed SoundCloud iFrame?

I am building a social networking website for musicians and I would like them to be able to enter the embed code provided by SoundCloud, so that they may have a sound clip on their posts.
However, I am unsure how I would sanitise the input, to ensure that it's only a SoundCloud iframe embed code that they enter. I want to avoid them pasting in embed code for say, YouTube or anything else for that matter.
An example embed code from SoundCloud looks like:
<iframe width="100%" height="166" scrolling="no" frameborder="no" src="https://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F85146642"></iframe>
I am using the HTML parser, jSoup to sanitise input.
The key fragment to this is the src content:
https://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F85146642
One possibility I thought of, was to extract the src parameters value and then rebuild the iframe myself, this way, only storing the URL and ensuring that any HTML output to the browser is that which I have created myself. Doing this may also allow me to run checks on the domain name etc.
I'm wondering what the best approach would be for this?
Appreciate any input you may have.
Thanks,
Michael.
PS - I am using Railo (ColdFusion server) and the Java jSoup library, but I guess the same principles would apply regardless of what language one would use.

Parse exported bookmarks file with ColdFusion

I need to parse a list of bookmarks exported from a browser like Chrome, Firefox and IE. Maybe even google etc.
I played around and did something like this reMatchNoCase("(<h3)(.*?)(</dl>)",myfile1) loop. Then I use reMatchNoCase("(<dt[>])(.*?)(</a>)",i) within the h3/dl
tags, and then a lot of cleanup, but its really not reliable.
The thing is that they have categories using h3 tags surrounded by dl tags and then the bookmarks in that. I can't just parse all URLs since I want to get the categories as in the browser.
Thanks.
if it is XHTML, use XPath
if it is not, it wouldn't be easy. Search https://stackoverflow.com/search?q=parse+html
can you consider using a hybrid approach, parse with jQuery on client side first and post to CF?