MVC - Strip unwanted text from rss feed - regex

Ive got the following code in my RSS consumer (Vandelay Industries RemoteRSS) in my Orchard CMS implementation:
#using System.Xml.Linq
#{
var feed = Model.Feed as XElement;
}
<ul>
#foreach(var item in feed
.Element("channel")
.Elements("item")
.Take((int)Model.ItemsToDisplay))
{
<li>#T(item.Element("description").Value)</li>
}
</ul>
The rss feed Im using is from Pinterest, and this bundles the image, link, and a short description all inside the 'description' elements of the feed.
<description><a href="/pin/215609900882251703/"><img src="http://media-cache-ec2.pinterest.com/upload/88664686384961121_UIyVRN8A_b.jpg"></a>How to install Orchard CMS on IIS Server</description>
My issue is that I don't want the text bits, and I also need to prefix the 'href=' links with 'http://www.pinterest.com'.
I've managed to edit the original code with my newbie skills to the above,, which essentially displays the images as links which are only relative and thus pointing locally to my server. These images are also then followed by the short description.
So to summarise, I need a way to prefix all links with 'http://pinterest.com' and then to remove the fee text after the image/links.
Any pointers will be greatly appreciated, Thanks.

You should probably parse the description, with something like http://htmlagilitypack.codeplex.com/, and then tweak it to add the prefix. Or you can learn regular expression and do without a library. Could be a little trickier and error-prone however.

Related

Yahoo Pipes and Website Name

How do I fetch Page Name with Yahoo Pipes?
I'm making a news / blog aggregator, and need to know the name of the site where the info is coming from (bbc, cnn, fox, etc).
Do I need to do this with REGEX?
Anyone that can help?
You can fetch the page using the XPath Fetch Page or Fetch Feed modules in the Sources menu. Maybe with others too.
After that you can extract the page name itself using the various operators, possibly Regex, or others, depending on the source page you are using and the output you want to get.
In general your question is too broad and difficult to answer. To get you started, I created an example pipe that extracts the title of your question from this post, which is basically the "page name" of the current page.
http://pipes.yahoo.com/pipes/pipe.info?_id=668acf3f807c30d7b75f12459edd3252
I used the XPath Fetch Page with parameters:
URL = this page
Extract using XPath = //div[#id="question-header"]
I got that div path by inspecting the source code of this page, where I saw that div#question-header is the container of a question. I could have selected a deeper inner container or a higher level container. It all depends on the amount of other information you need. The more information you want to you from the page, the higher level container you select.
Next, I used the Create RSS operator to create a proper RSS feed, with parameters:
Title = h1.a
Link = h1.a.href
I chose these elements because in the container I extracted with xpath, the page name is inside h1 a. In Yahoo Pipes you use a dot as the path separator.
I found this sample pipe http://pipes.yahoo.com/pipes/pipe.info?_id=69b5dce1c59501a0c64a660c1cfdb856. The page title included the name of the site too. I am not sure if this what you are looking for.

How to ensure users only embed SoundCloud iFrame?

I am building a social networking website for musicians and I would like them to be able to enter the embed code provided by SoundCloud, so that they may have a sound clip on their posts.
However, I am unsure how I would sanitise the input, to ensure that it's only a SoundCloud iframe embed code that they enter. I want to avoid them pasting in embed code for say, YouTube or anything else for that matter.
An example embed code from SoundCloud looks like:
<iframe width="100%" height="166" scrolling="no" frameborder="no" src="https://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F85146642"></iframe>
I am using the HTML parser, jSoup to sanitise input.
The key fragment to this is the src content:
https://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F85146642
One possibility I thought of, was to extract the src parameters value and then rebuild the iframe myself, this way, only storing the URL and ensuring that any HTML output to the browser is that which I have created myself. Doing this may also allow me to run checks on the domain name etc.
I'm wondering what the best approach would be for this?
Appreciate any input you may have.
Thanks,
Michael.
PS - I am using Railo (ColdFusion server) and the Java jSoup library, but I guess the same principles would apply regardless of what language one would use.

ASP.NET MVC - Regex to catch image and link but lose free text

I'm diving in to Orchard CMS and ASP.NET MVC, and could do with a little help. I am consuming an RSS feed that consists of HTML -a link around an image- that I want to keep, followed by some text that I don't want.
Eg:
<img src="http://media-cache-ec3.pinterest.com/upload/65935582014430387_d5ueoRR6_b.jpg">Nice graphic design & typography
I figure the best way to do this is use a regex to detect the required HTML. However I don't have much experience of regex formatting, nor do I know how I should go about implementing the regex within my scenario. The code below is what I'm currently working with:
#using System.Xml.Linq
#{
var feed = Model.Feed as XElement;
}
<ul>
#foreach(var item in feed
.Element("channel")
.Elements("item")
.Take((int)Model.ItemsToDisplay)) {
<li>#T(item.Element("description").Value))</li>
}
</ul>
So, I essentially have two questions (with no1 being the most important):
How should I implement a regex to lose the unwanted free text
What would the regex be that I need to do this
I dealt with this using css to hide the text.

Regex with iframe in Yahoo! Pipes

I'm building a Yahoo! Pipe to pull an RSS feed from Reddit which links to some content in the description. I'm using a regex to match the href attribute of the anchor link in an item.description field. The regex I'm using is:
^.+?href="([^"]+)">\[link\].+?$
As a test, I set the replace to simply:
$1
and I see that the entire description field has been replaced with the URL. So far, so good.
I then put the following in the replace field. The idea being to iframe the content that's linked to:
Content: <iframe src="$1">no iframe support</iframe> End
What I get out however is:
Content: no iframe support End
I've confirmed that this is also coming through in the pipe's output and not just in the Yahoo! Pipes debug console.
I've so far tried replacing my angle brackets with < and > entities. I've tried wrapping the entire thing in a <![CDATA[ ... ]]> block and still, I get nothing. If I break my iframe tag by removing an angle bracket, the broken content comes through fine, but if I have a well-formed iframe element, it vanishes, leaving the "no iframe support" text. Am I doing something wrong here, or is Yahoo! actively preventing me from using iframe tags in my generated pipe? A cursory search on Google isn't turning up anything related to this.
The pipe in question is here:
http://pipes.yahoo.com/pipes/pipe.info?_id=2ba41448cadd2347d86f377efd3d199f
This Pipes FAQ Question "Why does Pipes Strip <object> and <embed> tags... ?" shows that a certain amount of sanitization is performed, by placing content (at least certain content) into an iframe for the safety of RSS consumers - though it does not state it specifically, this probably also removes other iframes in order to avoid nesting and other work-arounds.
Yahoo is big enough I would doubt they have a week sanitizer, but an extremely long shot is that you might be able to fool it by nesting the iframe in a bunch of other tags (again I doubt this will work). Also depending upon which step does the sanitization, perhaps adding part of the tag in one step, then adding another part somewhere else might work (yet again, doubt overwhelms me)
Not sure what else to suggest, other than getting something else to consume and transform your RSS a little bit more (by fixing otherwise broken tags??) - but that's what you're using pipes for to begin with, isn't it? Idunno...
Good luck!
Pipes has an fanatical devotion to the RSS spec and the spec says the description field is plain text only. HTML etc is supposed to go in the content:encoded field, not that I've had much luck getting pipes to do that.

Customizing Containable Content in Orchard CMS

I am currently trying to understand a bit more about how Orchard handles Lists of Custom Content Types and I have run into a bit of an issue.
I created a Content Type named Story, which has the following parts:
Body
Common
Containable
Route
I created a list that holds these items, and all I am attempting to do is style them in such a way:
Story Title
Story Description (Basically a truncated version of the body?)
However, I cannot seem to figure out how to do the following:
Get the Title to actually appear (Currently all that appears is the body and a more link)
Remove the "more" link (and change this to be the actual Title)
I have looked into changing the Placement.info, and have looked all over in an attempt to find where the "more" link is added in each of the items. Any help would be greatly appreciated.
I finally managed to figure it out - Thanks to the Designer Tools Module, which made it very simple to go look into what was going on behind the scenes during Page Generation.
Basically - all that was necessary to accomplish this was to make some minor changes to the Parts.Common.Body.Summary.cshtml file. (found via ../Core/Common/Views/)
Which initially resembles the following:
#{
[~.ContentItem] contentItem = Model.ContentPart.ContentItem;
string bodyHtml = Model.Html.ToString();
var body = new HtmlString(Html.Excerpt(bodyHtml, 200).ToString()
.Replace(Environment.NewLine,"</p>"+Environment.NewLine+"<p>"));
}
<p>#body #Html.ItemDisplayLink(T("more").ToString(), contentItem)</p>
however by making a few changes (by using the Designer Tools) I change it into the following:
#{
[~.ContentItem] contentItem = Model.ContentPart.ContentItem;
string bodyHtml = Model.Html.ToString();
string title = Model.ContentPart.ContentItem.RoutePart.Title;
string summary = Html.Excerpt(bodyHtml, 100) + "...";
}
<div class='story'>
<p>
#Html.ItemDisplayLink(title, contentItem)
</p>
<summary>
#summary
</summary>
</div>
Although it could easily be shortened a bit - It does make the styling quite a big easier to handle. Anyways - I hope this helps :)
Alternately you can use the placement.info file in your theme assign different fields to your Summary and Detail views. It's much simplier.
http://orchardproject.net/docs/Understanding-placement-info.ashx
But, I used the same method you did till I discovered the .info file as well. It works and gives you a good understanding of how the system works, but the placement.info file seems easier.
Also, you probably don't want to be editing the view files in Core. I think your meant to override views in your theme directory.